POST
/
v1
/
scrapes

Authorizations

Authorization
string
header
required

Bearer authentication header of the form Bearer <token>, where <token> is your auth token.

Body

application/json
url_to_scrape
string
required

The URL to start scraping from.

actions
object[]

Actions to perform on the page before getting the content.

country
string

Country to load the request from. Provide in ISO 3166-1 alpha-2 codes like US(USA), IN(India), etc

formats
enum<string>[]

Formats in which you want the content.

Available options:
html,
markdown,
parser_extract,
llm_extract,
raw_pdf

With this option, you can get all the links present on the page you scrape.

llm_extract
object
metadata
object

User-defined metadata. Not supported yet

parser_extract
object

Configuration for parser extraction.

remove_class_names
string[]

List of class names to remove from the content.

remove_css_selectors
enum<string>
default:
default

Option to remove certain CSS selectors from the content. Optionally, you can also pass a JSON stringified array of specific selectors you want to remove. The CSS selectors removed when this option is set to default are ['nav','footer','script','style','noscript','svg',[role=alert],[role=banner],[role=dialog],[role=alertdialog],[role=region][aria-label*=skip i],[aria-modal=true]]

Available options:
default,
none,
array
remove_images
boolean
default:
false

Option to remove images from the scraped content. Defaults to false.

screen_size
object

Configuration for screen size. Preset dimensions are available through screen_type: desktop (1920x1080), mobile (414x896), or default (768x1024).

transformer
enum<string>

Specify the HTML transformer to use, if any. Postlight's Mercury Parser library is used to remove ads and other unwanted content from the scraped content.

Available options:
postlight,
none
wait_before_scraping
integer

Time to wait in milliseconds before starting the scraping.

Response

200 - application/json
created
number

Created epoch

id
string

Scrape ID

metadata
object

User-defined metadata.

object
string

The kind of object. "scrape" for this endpoint.

result
object
url_to_scrape
string

The URL that was scraped.