Overview
Olostep provides a web scraping API that enables real-time price tracking of millions of products on an e-commerce at regular intervals (e.g. every few hours) in a scalable and cost effective way. This is useful for businesses that want to monitor price fluctuations, compare prices across multiple websites, or track competitor pricing strategies. In this guide, we will see how a customer is using Olostep to set up automated price tracking for millions of Amazon products daily.Why Use Olostep for Price Tracking?
- Scalability: Track prices for millions of products every few hours.
- Automation: Set up scheduled scraping tasks that run at predefined times/regular intervals.
- Multiple Formats: Retrieve data in JSON, html or markdown format.
- Custom Parsers: Extract only the relevant JSON information with our parsers or pass your own to the API.
How to Track Prices Using Olostep
Overview of the Process Setup
When tracking products at scale we recommend using Olostep’s Batches endpoint. This endpoint allows you to send multiple batches of URLs (each of up to 10k) to be processed in parallel and then retrieve the results after 5-8 minutes. You can send multiple batches at the same time, monitor their progress and retrieve the results once they are complete. In this way you can process millions of URLs in 15-20 minutes. The overall flow for price tracking using Olostep is as follows:- Read the products from the database and save the URLs you want to track in a CSV file.
- Read the data from the CSV file and start a batch using Olostep’s batch endpoint. This is done by posting the data to the endpoint in chunks of up to 10,000 URLs at a time.
- Check the batch status every 60 seconds to monitor the progress.
- Once the batch is complete, read the content and use it in your workflow.
Step 1: Export Product Data from your Database
The first step is to retrieve product information from your database and save it in a CSV format. This file should contain product identifiers, URLs, and any additional metadata required for tracking.Step 2: Start a Batch with Olostep
To start a batch, read the product data from the CSV and send it to the Olostep batch endpoint. This is done using an HTTP POST request with a JSON payload. Each batch can have up to 10k URLs. For large datasets (>10,000 URLs), split into multiple batches and send them in parallel. A batch consists of an array of items, where each item represents a product URL to be processed. Here’s the structure of a batch requestBatch Array Structure
Each item in the batch_array should follow this structure:Array of items to process. Maximum of 10,000 URLs per batch. Each item must have a unique
custom_id
.Two-letter country code (e.g., “IT” for Italy).
Name of the custom parser to use (e.g., “@olostep/amazon-it-product”). Contact us at info@olostep.com to get access to the pre-built parsers or to create your own.
Step 3: Monitor Batch Status
Once a batch is started, you’ll need to monitor its status to determine when processing is complete. The API provides a status endpoint that can be polled periodically (e.g., every 60 seconds) with the batch_idStep 4: Retrieve the IDs for Completed Items
Once the batch is marked as completed, you can fetch the list of completed items. Each item will have a retrieve_id. If you want the actual content use the retrieve endpoint by passing theretrieve_id
retrieve_id
for every URL sent. You can then use the retrieve endpoint to retrieve and store the extracted data (html, markdown or JSON) for each URL.
You can get the retrieve_id
for each item in the batch using the following code:
Step 5: Retrieve the Content for each Item
Once you have theretrieve_id
for each item, you can fetch its content (HTML, Markdown, or JSON) using the retrieve endpoint: