Get the content of multiple websites in one go
Start a batch scrape to extract content from up to 100k URLs in 5-7 mins
Overview
Olostep’s Batches endpoint allows you to start a batch of up to 10,000 URLs and get back the content in 5–7 minutes. You can start up to 10 batches at a time to extract content from 100,000 URLs in one go. If you need more scale, please reach out to us
This is useful if you already have the URLs you want to process —for example, to aggregate data for analysis, build a specialized search tool, or monitor multiple websites for changes.
In this guide, we’ll walk through how to start a batch with a list of URLs and retrieve the content in markdown format.
Gist with Full Code
Here’s all the code in one gist that you can copy and paste to try out batch scraping with Olostep: https://gist.github.com/olostep/e903f2e4fc28f8093b834b4df68b8031
In this gist we have shown how to start a batch with 5 google search queries, check the status, and retrieve the content for each item.
Prerequisites
Before getting started, ensure you have the following:
- A valid Olostep API key. You can get one by signing up at Olostep.
- Python installed on your system.
- The
requests
andhashlib
libraries (installrequests
withpip install requests
if needed).
Step 1: Create a Batch from Local URLs
If you already have a list of URLs you want to process, you can define them directly in your script. Otherwise, you can read them from a file or database.
Step 2: Monitor Batch Status
Once the batch is started, you can monitor its status using the batch_id
that is returned when you start the batch
You can poll the status every few seconds (e.g. 10 seconds) until the batch is complete:
Step 3: Retrieve Completed Items
Once the batch is marked complete, fetch the processed items.
Each item will include a retrieve_id
which you can use to fetch the scraped content.
Step 4: Retrieve the Content
Use the retrieve_id
to get the extracted content in markdown, html or json. Here is an example to retrieve the content in markdown format:
Hosted Content
We also host the content for 7 days, so you can retrieve it multiple times without re-scraping. Example of a hosted url for markdown content
Example Use Cases
1. Build Search Engines
Use Olostep to extract content from industry-specific websites (legal, medical, AI) and build a searchable database.
2. Website Monitoring
Monitor product availability, price changes, or news updates on multiple websites by scheduling daily batch scrapes.
3. Social Media Monitoring
Scrape mentions of your brand or keywords across forums or content sources and extract structured data.
4. Aggregators
Build a job board, news aggregator, or real estate listing platform by pulling data from dozens of sources.
Conclusion
With batch scraping, you can extract content from up to 100k URLs quickly and efficiently. Whether you’re building search tools, aggregators, or monitoring systems, Olostep Batches simplify the job.
Want to extract only structured data? Use Parsers to get just the fields you need. Need help? Reach out to info@olostep.com
for support or have us write custom scripts for your use case.