Skip to main content
GET
/
v1
/
retrieve
Retrieve page content
curl --request GET \
  --url https://api.olostep.com/v1/retrieve \
  --header 'Authorization: Bearer <token>'
{
  "html_content": "<string>",
  "markdown_content": "<string>",
  "json_content": "<string>",
  "html_hosted_url": "<string>",
  "markdown_hosted_url": "<string>",
  "json_hosted_url": "<string>",
  "size_exceeded": true
}

Authorizations

Authorization
string
header
required

Bearer authentication header of the form Bearer <token>, where <token> is your auth token.

Query Parameters

retrieve_id
string
required

The ID of the page content to retrieve. Available in the response of /v1/crawls/{crawl_id}/pages, /v1/scrapes/{scrape_id} or /v1/batches/{batch_id}/items endpoints

formats
enum<string>[]

Optional array to retrieve only specific formats in production. If not provided, all formats will be returned.

Response

Successful response with page content.

html_content
string

HTML content of the page, if requested and available.

markdown_content
string

Markdown content of the page, if requested and available.

json_content
string

JSON content of the page returned from parsers, if requested and available.

html_hosted_url
string

S3 bucket URL of html. Expires in 7 days.

markdown_hosted_url
string

S3 bucket URL of markdown. Expires in 7 days.

json_hosted_url
string

S3 bucket URL of json. Expires in 7 days.

size_exceeded
boolean

If size of content objects exceeds the 6MB limit. If true, use hosted S3 urls to get content.

I