Price Tracking with Olostep
Learn how to use Olostep to track product prices at scale on an e-commerce.
Overview
Olostep provides a web scraping API that enables real-time price tracking of millions of products on an e-commerce at regular intervals (e.g. every few hours) in a scalable and cost effective way.
This is useful for businesses that want to monitor price fluctuations, compare prices across multiple websites, or track competitor pricing strategies.
In this guide, we will see how a customer is using Olostep to set up automated price tracking for millions of Amazon products daily.
Why Use Olostep for Price Tracking?
- Scalability: Track prices for millions of products every few hours.
- Automation: Set up scheduled scraping tasks that run at predefined times/regular intervals.
- Multiple Formats: Retrieve data in JSON, html or markdown format.
- Custom Parsers: Extract only the relevant JSON information with our parsers or pass your own to the API.
How to Track Prices Using Olostep
Overview of the Process Setup
When tracking products at scale we recommend using Olostep’s Batches endpoint.
This endpoint allows you to send multiple batches of URLs (each of up to 10k) to be processed in parallel and then retrieve the results after 5-8 minutes. You can send multiple batches at the same time, monitor their progress and retrieve the results once they are complete. In this way you can process millions of URLs in 15-20 minutes.
The overall flow for price tracking using Olostep is as follows:
- Read the products from the database and save the URLs you want to track in a CSV file.
- Read the data from the CSV file and start a batch using Olostep’s batch endpoint. This is done by posting the data to the endpoint in chunks of up to 10,000 URLs at a time.
- Check the batch status every 60 seconds to monitor the progress.
- Once the batch is complete, read the content and use it in your workflow.
You can start a batch and be returned the html/markdown content of the page and then parse it yourself to extract the data you want. But we recommend starting the batch with a parser so you are returned a JSON object with only the parsed data you need. You can pass your own parser to the API or use one of the pre-built ones we have for some common websites (e.g Amazon product pages, Google search results, Linkedin profiles, etc…).
We store the data for each batch for 7 days so you can retrieve it multiple times if needed.
Step 1: Export Product Data from your Database
The first step is to retrieve product information from your database and save it in a CSV format. This file should contain product identifiers, URLs, and any additional metadata required for tracking.
Step 2: Start a Batch with Olostep
To start a batch, read the product data from the CSV and send it to the Olostep batch endpoint. This is done using an HTTP POST request with a JSON payload.
Each batch can have up to 10k URLs. For large datasets (>10,000 URLs), split into multiple batches and send them in parallel.
A batch consists of an array of items, where each item represents a product URL to be processed. Here’s the structure of a batch request
Batch Array Structure
Each item in the batch_array should follow this structure:
Parameters
Array of items to process. Maximum of 10,000 URLs per batch. Each item must have a unique custom_id
.
Two-letter country code (e.g., “IT” for Italy).
Name of the custom parser to use (e.g., “@olostep/amazon-it-product”). Contact us at info@olostep.com to get access to the pre-built parsers or to create your own.
Response
The endpoint returns a JSON object containing the batch_id, which can be used to monitor the status and then retrieve the results.
Example Usage
Step 3: Monitor Batch Status
Once a batch is started, you’ll need to monitor its status to determine when processing is complete. The API provides a status endpoint that can be polled periodically (e.g., every 60 seconds) with the batch_id
For production use, it’s recommended to implement asynchronous monitoring to handle multiple batches efficiently:
Step 4: Retrieve the IDs for Completed Items
Once the batch is marked as completed, you can fetch the list of completed items. Each item will have a retrieve_id. If you want the actual content use the retrieve endpoint by passing the retrieve_id
This will return the completed items that have each a retrieve_id
for every URL sent. You can then use the retrieve endpoint to retrieve and store the extracted data (html, markdown or JSON) for each URL.
You can get the retrieve_id
for each item in the batch using the following code:
Step 5: Retrieve the Content for each Item
Once you have the retrieve_id
for each item, you can fetch its content (HTML, Markdown, or JSON) using the retrieve endpoint:
Conclusion
By following these steps, you can set up an automated price-tracking system using Olostep. Shortly we will publish an open-source github repo with the full code for this example.