Crawl Pages

Retrieve list of crawled pages optionally with content

curl --request GET \
  --url https://api.olostep.com/v1/crawls/{crawl_id}/pages \
  --header 'Authorization: Bearer <token>'

{
  "crawl_id": "<string>",
  "object": "<string>",
  "status": "<string>",
  "search_query": "<string>",
  "pages_count": 123,
  "pages": [
    {
      "id": "<string>",
      "retrieve_id": "<string>",
      "url": "<string>",
      "is_external": true,
      "html_content": "<string>",
      "markdown_content": "<string>"
    }
  ],
  "metadata": {
    "external_urls": [
      "<string>"
    ],
    "failed_urls": [
      "<string>"
    ]
  },
  "cursor": 123
}

GET

crawls

{crawl_id}

pages

Retrieve list of crawled pages optionally with content

curl --request GET \
  --url https://api.olostep.com/v1/crawls/{crawl_id}/pages \
  --header 'Authorization: Bearer <token>'

{
  "crawl_id": "<string>",
  "object": "<string>",
  "status": "<string>",
  "search_query": "<string>",
  "pages_count": 123,
  "pages": [
    {
      "id": "<string>",
      "retrieve_id": "<string>",
      "url": "<string>",
      "is_external": true,
      "html_content": "<string>",
      "markdown_content": "<string>"
    }
  ],
  "metadata": {
    "external_urls": [
      "<string>"
    ],
    "failed_urls": [
      "<string>"
    ]
  },
  "cursor": 123
}

Authorizations

Authorization

string

header

required

Bearer authentication header of the form Bearer <token>, where <token> is your auth token.

Path Parameters

crawl_id

string

required

The ID of the crawl to retrieve the list of URLs for.

Query Parameters

cursor

integer

Optional integer representing the index to start fetching content from. Useful to paginate until all URLs are fetched. Start with 0, then provide response['cursor'] value of the last request.

limit

integer

Optional integer to limit the number of results returned. Recommended 10-50 results at a time. Paginated using cursor. Maximum 10MB of content can be fetched in a single request.

search_query

string

An optional search query to sort the results by relevance. Uses the original search_query by default if provided.

formats

enum<string>[]

Deprecated: Use /retrieve endpoint with retrieve_id.

Array of formats to fetch (e.g., ["html", "markdown"]).

Show child attributes

Response

200

application/json

Successful response with the list of URLs.

The response is of type object.

Crawl Info Maps

Scrapes

Batches

Crawls

Maps

Retrieve

Authorizations

Path Parameters

Query Parameters

Response