Create Scrape
Scrape a url with provided configuration and get content.
Authorizations
Bearer authentication header of the form Bearer <token>
, where <token>
is your auth token.
Body
The URL to start scraping from.
Actions to perform on the page before getting the content.
Country to load the request from. Provide in ISO 3166-1 alpha-2 codes like US(USA), IN(India), etc
Formats in which you want the content.
html
, markdown
, parser_extract
, llm_extract
, raw_pdf
With this option, you can get all the links present on the page you scrape.
User-defined metadata. Not supported yet
Configuration for parser extraction.
List of class names to remove from the content.
Option to remove certain CSS selectors from the content. Optionally, you can also pass a JSON stringified array of specific selectors you want to remove. The CSS selectors removed when this option is set to default are ['nav','footer','script','style','noscript','svg',[role=alert],[role=banner],[role=dialog],[role=alertdialog],[role=region][aria-label*=skip i],[aria-modal=true]]
default
, none
, array
Option to remove images from the scraped content. Defaults to false.
Configuration for screen size. Preset dimensions are available through screen_type: desktop (1920x1080), mobile (414x896), or default (768x1024).
Specify the HTML transformer to use, if any. Postlight's Mercury Parser library is used to remove ads and other unwanted content from the scraped content.
postlight
, none
Time to wait in milliseconds before starting the scraping.