Skip to main content
Olostep’s API is designed around objects. Understanding this design helps you build more effective integrations. This design is inspired by Stripe’s API philosophy.

Everything is an Object

Every resource in Olostep is an object with a unique identifier. Whether you create it via the API, SDK, or dashboard — you get back an object you can reference, update, and query.
ResourceObject ID FormatExample
Scrapescrape_*scrape_abc123
Batchbatch_*batch_xyz789
Crawlcrawl_*crawl_def456
Mapmap_*map_ghi012
Answeranswer_*answer_jkl345
Filefile_*file_mno678
Scheduleschedule_*schedule_pqr901

Objects Can Have Lifecycles

Some Olostep objects track state through a status field. This state machine pattern lets you know exactly where each resource is in its lifecycle.

Batches

Batches have two levels of status: the batch itself and individual items. Batch Status:
in_progress → completed
StatusDescription
in_progressURLs are being scraped
completedProcessing finished
Batch-level failures are extremely rare. Batches almost always complete — even if some URLs fail, the batch itself reaches completed status. In the rare case of a catastrophic infrastructure failure (e.g., LLM service outage during enrichment), the batch may fail. This affects less than 0.01% of batches.
Item Status: Each URL in a batch is tracked as an individual item with its own status:
StatusDescription
successURL scraped successfully
failedURL could not be scraped
Items can fail due to:
  • URL is blocked or returns an error
  • Parser output missing
  • Network/fetch errors
Failed items include an error object with code and message explaining the failure. The batch still completes — check each item’s status when processing results.

Crawls

in_progress → completed
StatusDescription
in_progressActively discovering and processing URLs
completedCrawling finished
Crawls always complete. Even if a crawl finds 0 URLs (due to robots.txt blocking or invalid start URL), the crawl status will be completed. Check the pages_count field to verify results.

Retrieve Pattern

Many objects produce content that can be retrieved later. The retrieve_id pattern lets you fetch content without re-processing.
# Get content using retrieve_id
curl "https://api.olostep.com/v1/retrieve?retrieve_id=6h89o8u1kt" \
  -H "Authorization: Bearer <your_token>"
This pattern is used by:
  • Batch items — Each processed URL gets a retrieve_id
  • Crawl pages — Each crawled page gets a retrieve_id
The /v1/retrieve endpoint accepts formats parameter to specify which content types to return (html, markdown, json, text).

Webhooks: Event-Driven Updates

Instead of polling for status changes, configure webhooks to receive events when objects change state.
{
  "event": "batch.completed",
  "data": {
    "id": "batch_xyz789",
    "status": "completed",
    "items_total": 100,
    "items_completed": 100
  }
}

Metadata: Your Data Alongside Ours

Attach custom key-value pairs to objects using metadata. This lets you link Olostep resources to your internal systems.
{
  "items": [{"url": "https://example.com"}],
  "metadata": {
    "order_id": "12345",
    "customer": "acme-corp"
  }
}

Summary

ConceptDescription
ObjectsEvery resource has a unique ID and is queryable
LifecyclesTrack progress via status field
RetrieveFetch content later with retrieve_id
WebhooksGet notified when state changes
MetadataAttach your own data to any object