GET
/
v1
/
crawls
/
{crawl_id}
/
pages
Retrieve list of crawled pages optionally with content
curl --request GET \
  --url https://api.olostep.com/v1/crawls/{crawl_id}/pages \
  --header 'Authorization: Bearer <token>'
{
  "crawl_id": "<string>",
  "object": "<string>",
  "status": "<string>",
  "search_query": "<string>",
  "pages_count": 123,
  "pages": [
    {
      "id": "<string>",
      "retrieve_id": "<string>",
      "url": "<string>",
      "is_external": true,
      "html_content": "<string>",
      "markdown_content": "<string>"
    }
  ],
  "metadata": {
    "external_urls": [
      "<string>"
    ],
    "failed_urls": [
      "<string>"
    ]
  },
  "cursor": 123
}

Authorizations

Authorization
string
header
required

Bearer authentication header of the form Bearer <token>, where <token> is your auth token.

Path Parameters

crawl_id
string
required

The ID of the crawl to retrieve the list of URLs for.

Query Parameters

cursor
integer

Optional integer representing the index to start fetching content from. Useful to paginate until all URLs are fetched. Start with 0, then provide response['cursor'] value of the last request.

limit
integer

Optional integer to limit the number of results returned. Recommended 10-50 results at a time. Paginated using cursor. Maximum 10MB of content can be fetched in a single request.

search_query
string

An optional search query to sort the results by relevance. Uses the original search_query by default if provided.

formats
enum<string>[]

Deprecated: Use /retrieve endpoint with retrieve_id.

Array of formats to fetch (e.g., ["html", "markdown"]).

Response

200
application/json

Successful response with the list of URLs.

The response is of type object.