跳转到主要内容
GET
/
v1
/
crawls
/
{crawl_id}
/
pages
Retrieve list of crawled pages optionally with content
curl --request GET \
  --url https://api.olostep.com/v1/crawls/{crawl_id}/pages \
  --header 'Authorization: Bearer <token>'
{
  "crawl_id": "<string>",
  "object": "<string>",
  "status": "<string>",
  "search_query": "<string>",
  "pages_count": 123,
  "pages": [
    {
      "id": "<string>",
      "retrieve_id": "<string>",
      "url": "<string>",
      "is_external": true,
      "html_content": "<string>",
      "markdown_content": "<string>"
    }
  ],
  "metadata": {
    "external_urls": [
      "<string>"
    ],
    "failed_urls": [
      "<string>"
    ]
  },
  "cursor": 123
}

授权

Authorization
string
header
必填

Bearer 认证头的格式为 Bearer ,其中 是你的认证令牌。

路径参数

crawl_id
string
必填

The ID of the crawl to retrieve the list of URLs for.

查询参数

cursor
integer

Optional integer representing the index to start fetching content from. Useful to paginate until all URLs are fetched. Start with 0, then provide response['cursor'] value of the last request.

limit
integer

Optional integer to limit the number of results returned. Recommended 10-50 results at a time. Paginated using cursor. Maximum 10MB of content can be fetched in a single request.

search_query
string

An optional search query to sort the results by relevance. Uses the original search_query by default if provided.

formats
enum<string>[]

Deprecated: Use /retrieve endpoint with retrieve_id.

Array of formats to fetch (e.g., ["html", "markdown"]).

可用选项:
html,
markdown

响应

Successful response with the list of URLs.

crawl_id
string

爬虫 ID

object
string

The kind of object. "crawl" for this endpoint.

status
string

in_progresscompleted

search_query
string
pages_count
number
pages
object[]
metadata
object
cursor
integer

在下一个请求的查询中传递,以获取下一个项目。