> ## Documentation Index
> Fetch the complete documentation index at: https://docs.olostep.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Get the content of multiple websites in one go

> Start a batch scrape to extract content from up to 100k URLs in 5-7 mins

## Overview

Olostep's [Batches](https://docs.olostep.com/api-reference/batches/create) endpoint allows you to start a batch of up to 10,000 URLs and get back the content in 5–7 minutes. You can start up to 10 batches at a time to extract content from 100,000 URLs in one go. If you need more scale, please reach out to us

This is useful if you already have the URLs you want to process —for example, to aggregate data for analysis, build a specialized search tool, or monitor multiple websites for changes.

In this guide, we’ll walk through how to start a batch with a list of URLs and retrieve the content in markdown format.

## Gist with Full Code

Here's  all the code in one gist that you can copy and paste to try out batch scraping with Olostep:
[https://gist.github.com/olostep/e903f2e4fc28f8093b834b4df68b8031](https://gist.github.com/olostep/e903f2e4fc28f8093b834b4df68b8031)

In this gist we have shown how to start a batch with 5 google search queries, check the status, and retrieve the content for each item.

## Prerequisites

Before getting started, ensure you have the following:

* A valid Olostep API key. You can get one by signing up at [Olostep](https://www.olostep.com/dashboard).
* Python installed on your system.
* The `requests` and `hashlib` libraries (install `requests` with `pip install requests` if needed).

## Step 1: Create a Batch from Local URLs

If you already have a list of URLs you want to process, you can define them directly in your script. Otherwise, you can read them from a file or database.

```python filename="start_batch.py" theme={null}
import requests
import hashlib

API_KEY = "YOUR_API_KEY"

def create_hash_id(url):
    return hashlib.sha256(url.encode()).hexdigest()[:16]

def compose_items_array():
    urls = [
        "https://www.google.com/search?q=nikola+tesla&gl=us&hl=en",
        "https://www.google.com/search?q=alexander+the+great&gl=us&hl=en",
        "https://www.google.com/search?q=google+solar+eclipse&gl=us&hl=en",
        "https://www.google.com/search?q=crispr&gl=us&hl=en",
        "https://www.google.com/search?q=genghis%20khan&gl=us&hl=en"
    ]
    return [{"custom_id": create_hash_id(url), "url": url} for url in urls]

def start_batch(items):
    payload = {
        "items": items
    }
    headers = {"Authorization": f"Bearer {API_KEY}"}
    response = requests.post(
        "https://api.olostep.com/v1/batches",
        headers=headers,
        json=payload
    )
    return response.json()["id"]

if __name__ == "__main__":
    items = compose_items_array()
    batch_id = start_batch(items)
    print("Batch started. ID:", batch_id)
```

## Step 2: Monitor Batch Status

Once the batch is started, you can monitor its status using the `batch_id` that is returned when you start the batch

```python filename="check_status.py" theme={null}
import requests

def check_batch_status(batch_id):
    headers = {"Authorization": f"Bearer {API_KEY}"}
    response = requests.get(
        f"https://api.olostep.com/v1/batches/{batch_id}",
        headers=headers
    )
    return response.json()["status"]
```

You can poll the status every few seconds (e.g. 10 seconds) until the batch is complete:

```python theme={null}
import time

def recursive_check(batch_id):
    status = check_batch_status(batch_id)
    print("Status:", status)
    if status == "completed":
        print("Batch is complete!")
    else:
        time.sleep(60)
        recursive_check(batch_id)
```

## Step 3: Retrieve Completed Items

Once the batch is marked complete, fetch the processed items.

```python theme={null}
import requests

def get_completed_items(batch_id):
    headers = {"Authorization": f"Bearer {API_KEY}"}
    response = requests.get(
        f"https://api.olostep.com/v1/batches/{batch_id}/items",
        headers=headers
    )
    return response.json()["items"]
```

Each item will include a `retrieve_id` which you can use to fetch the scraped content.

```python theme={null}
items = get_completed_items(batch_id)
for item in items:
    print(f"URL: {item['url']}\nCustom ID: {item['custom_id']}\nRetrieve ID: {item['retrieve_id']}\n---")
```

## Step 4: Retrieve the Content

Use the `retrieve_id` to get the extracted content in markdown, html or json. Here is an example to retrieve the content in markdown format:

```python filename="retrieve_content.py" theme={null}
def retrieve_content(retrieve_id):
    url = "https://api.olostep.com/v1/retrieve"
    headers = {"Authorization": f"Bearer {API_KEY}"}
    params = {"retrieve_id": retrieve_id}

    response = requests.get(url, headers=headers, params=params)
    return response.json()

# Example usage:
items = get_completed_items(batch_id)
for item in items:
    content = retrieve_content(item['retrieve_id'])
    print(content)
```

## Hosted Content

We also host the content for 7 days, so you can retrieve it multiple times without re-scraping.
Example of a hosted url for [markdown content](https://olostep-storage.s3.us-east-1.amazonaws.com/markDown_p64bro0q3n_924d0fa125efcc5c.txt)

## Example Use Cases

### 1. Build Search Engines

Use Olostep to extract content from industry-specific websites (legal, medical, AI) and build a searchable database.

### 2. Website Monitoring

Monitor product availability, price changes, or news updates on multiple websites by scheduling daily batch scrapes.

### 3. Social Media Monitoring

Scrape mentions of your brand or keywords across forums or content sources and extract structured data.

### 4. Aggregators

Build a job board, news aggregator, or real estate listing platform by pulling data from dozens of sources.

## Conclusion

With batch scraping, you can extract content from up to 100k URLs quickly and efficiently. Whether you're building search tools, aggregators, or monitoring systems, Olostep Batches simplify the job.

Want to extract only structured data? Use [Parsers](https://docs.olostep.com/features/structured-content/parsers) to get just the fields you need. Need help? Reach out to `info@olostep.com` for support or have us write custom scripts for your use case.
