POST
/
v1
/
crawls

Authorizations

Authorization
string
header
required

Bearer authentication header of the form Bearer <token>, where <token> is your auth token.

Body

application/json
include_urls
string[]
required

URL path names in glob pattern to include. For example:

  • '/**' for the entire website
  • '/blog/**' for only blogs.
max_pages
number
required

Maximum number of pages to crawl. Recommended for most use cases like crawling an entire website.

start_url
string
required

The starting point of the crawl.

exclude_urls
string[]

URL path names in glob pattern to exclude. For example: /careers/**. Excluded URLs will supersede included URLs.

include_external
boolean

Crawl first-degree external links.

max_depth
number

Maximum depth of the crawl. Useful to extract only up to n-degree of links.

search_query
string

An optional search query to find specific links and also sort the results by relevance.

top_n
number

An optional number to only crawl the top N most relevant links on every page as per search query.

Response

200 - application/json
created
number

Created time in epoch

current_depth
number

The current depth of the crawl process.

exclude_urls
string[]
id
string

Crawl ID

include_external
boolean
include_urls
string[]
max_depth
number
max_pages
number
object
string

The kind of object. "crawl" for this endpoint.

pages_count
number

Count of pages crawled

search_query
string
start_date
string

Created time in date

start_url
string
status
string

in_progress or completed

top_n
number