Start agent

Authorizations

Authorization

string

header

required

Bearer authentication header of the form Bearer <token>, where <token> is your auth token.

Query Parameters

url

string

required

The URL to start scraping from.

timeout

integer

default:40

Timeout in seconds for the scraping process, with a maximum of 620 seconds.

waitBeforeScraping

integer

default:3

Time to wait in seconds before starting the scraping, up to 500 seconds.

saveHtml

boolean

default:true

Option to save the scraped content as HTML.

saveMarkdown

boolean

default:true

Option to save the scraped content as Markdown.

removeCSSselectors

enum<string>

default:default

Option to remove certain CSS selectors from the content. Optionally, you can also pass a JSON stringified array of specific selectors you want to remove. The CSS selectors removed when this option is set to default are ['nav','footer','script','style','noscript','svg',[role=alert],[role=banner],[role=dialog],[role=alertdialog],[role=region][aria-label*=skip i],[aria-modal=true]]

Available options:

default,

none,

JSON stringified array of CSS selectors

htmlTransformer

enum<string>

default:none

Specify the HTML transformer to use, if any. Postlight's Mercury Parser library is used to remove ads and other unwanted content from the scraped content.

Available options:

none,

postlightParser

removeImages

boolean

default:true

Option to remove images from the scraped content.

expandMarkdown

boolean

default:false

If true, the markdown content is returned in the markdown_content field.

expandHtml

boolean

default:false

If true, the HTML content is returned in the html_content field.

actions

(Wait · object | Click · object | Fill Input · object | Scroll · object)[]

Actions to perform on the page before getting the content

Response

200

Successful response with the requested data.

Scrapes

Batches

Crawls

Maps

Retrieve

Authorizations

Query Parameters

Response