Start agent
This is the endpoint to start the scraping agent (and optionally retrieve the scraped content using the expand parameters)
Authorizations
Bearer authentication header of the form Bearer <token>
, where <token>
is your auth token.
Query Parameters
The URL to start scraping from.
Timeout in seconds for the scraping process, with a maximum of 620 seconds.
Time to wait in seconds before starting the scraping, up to 500 seconds.
Option to save the scraped content as HTML.
Option to save the scraped content as Markdown.
Option to remove certain CSS selectors from the content. Optionally, you can also pass a JSON stringified array of specific selectors you want to remove. The CSS selectors removed when this option is set to default
are ['nav','footer','script','style','noscript','svg',[role=alert],[role=banner],[role=dialog],[role=alertdialog],[role=region][aria-label*=skip i],[aria-modal=true]]
default
, none
, JSON stringified array of CSS selectors
Specify the HTML transformer to use, if any. Postlight's Mercury Parser library is used to remove ads and other unwanted content from the scraped content.
none
, postlightParser
Option to remove images from the scraped content.
If true, the markdown content is returned in the markdown_content field.
If true, the HTML content is returned in the html_content field.
Actions to perform on the page before getting the content