POST
/
v1
/
crawls
curl --request POST \
  --url https://api.olostep.com/v1/crawls \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '{
  "start_url": "<string>",
  "include_urls": [
    "<string>"
  ],
  "exclude_urls": [
    "<string>"
  ],
  "max_pages": 123,
  "max_depth": 123,
  "include_external": true,
  "search_query": "<string>",
  "top_n": 123,
  "webhook_url": "<string>"
}'
{
  "id": "<string>",
  "object": "<string>",
  "status": "<string>",
  "created": 123,
  "start_date": "<string>",
  "start_url": "<string>",
  "max_pages": 123,
  "max_depth": 123,
  "exclude_urls": [
    "<string>"
  ],
  "include_urls": [
    "<string>"
  ],
  "include_external": true,
  "search_query": "<string>",
  "top_n": 123,
  "current_depth": 123,
  "pages_count": 123,
  "webhook_url": "<string>"
}

Authorizations

Authorization
string
header
required

Bearer authentication header of the form Bearer <token>, where <token> is your auth token.

Body

application/json
start_url
string
required

The starting point of the crawl.

max_pages
number
required

Maximum number of pages to crawl. Recommended for most use cases like crawling an entire website.

include_urls
string[]

URL path patterns to include in the crawl using glob syntax.

Defaults to /** which includes all URLs. Use patterns like /blog/** to crawl specific sections (e.g., only blog pages), /products/*.html for product pages, or multiple patterns for different sections. Supports standard glob features like * (any characters) and ** (recursive matching).

exclude_urls
string[]

URL path names in glob pattern to exclude. For example: /careers/**. Excluded URLs will supersede included URLs.

max_depth
number

Maximum depth of the crawl. Useful to extract only up to n-degree of links.

include_external
boolean

Crawl first-degree external links.

search_query
string

An optional search query to find specific links and also sort the results by relevance.

top_n
number

An optional number to only crawl the top N most relevant links on every page as per search query.

webhook_url
string

An optional POST request endpoint called when this crawl is completed. The body of the request will be same as the response of this v1/crawls/{crawl_id} endpoint.

Response

200
application/json
Crawl started successfully.
id
string

Crawl ID

object
string

The kind of object. "crawl" for this endpoint.

status
string

in_progress or completed

created
number

Created time in epoch

start_date
string

Created time in date

start_url
string
max_pages
number
max_depth
number
exclude_urls
string[]
include_urls
string[]
include_external
boolean
search_query
string
top_n
number
current_depth
number

The current depth of the crawl process.

pages_count
number

Count of pages crawled

webhook_url
string