Start Crawl
Starts a new crawl. You receive a id
to track the progress. The operation may take 1-10 mins depending upon the site and depth and pages parameters.
Authorizations
Bearer authentication header of the form Bearer <token>
, where <token>
is your auth token.
Body
URL path names in glob pattern to include. For example:
'/**'
for the entire website'/blog/**'
for only blogs.
Maximum number of pages to crawl. Recommended for most use cases like crawling an entire website.
The starting point of the crawl.
URL path names in glob pattern to exclude. For example: /careers/**
. Excluded URLs will supersede included URLs.
Crawl first-degree external links.
Maximum depth of the crawl. Useful to extract only up to n-degree of links.
An optional search query to find specific links and also sort the results by relevance.
An optional number to only crawl the top N most relevant links on every page as per search query.
Response
Created time in epoch
The current depth of the crawl process.
Crawl ID
The kind of object. "crawl" for this endpoint.
Count of pages crawled
Created time in date
in_progress
or completed