> ## Documentation Index
> Fetch the complete documentation index at: https://docs.olostep.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Crawl Pages

> Fetches the list of pages for a specific crawl.


## OpenAPI

````yaml openapi/crawls.json GET /v1/crawls/{crawl_id}/pages
openapi: 3.0.3
info:
  title: Crawl API
  version: 1.0.0
servers:
  - url: https://api.olostep.com
security: []
paths:
  /v1/crawls/{crawl_id}/pages:
    get:
      summary: Retrieve list of crawled pages optionally with content
      description: >-
        Fetches the list of crawled pages and content that have been processed
        for a specific crawl ID.
      parameters:
        - name: crawl_id
          in: path
          required: true
          schema:
            type: string
          description: The ID of the crawl to retrieve the list of URLs for.
        - name: cursor
          in: query
          required: false
          schema:
            type: integer
          description: >-
            Optional integer representing the index to start fetching content
            from. Useful to paginate until all URLs are fetched. Start with 0,
            then provide `response['cursor']` value of the last request.
        - name: limit
          in: query
          required: false
          schema:
            type: integer
          description: >-
            Optional integer to limit the number of results returned.
            Recommended 10-50 results at a time. Paginated using *cursor*.
            Maximum 10MB of content can be fetched in a single request.
        - name: search_query
          in: query
          required: false
          schema:
            type: string
          description: >-
            An optional search query to sort the results by relevance. Uses the
            original search_query by default if provided.
        - name: formats
          in: query
          required: false
          schema:
            type: array
            items:
              type: string
              enum:
                - html
                - markdown
          description: |-
            **Deprecated:** Use `/retrieve` endpoint with `retrieve_id`.

            Array of formats to fetch (e.g., ["html", "markdown"]).
      responses:
        '200':
          description: Successful response with the list of URLs.
          content:
            application/json:
              schema:
                type: object
                properties:
                  crawl_id:
                    type: string
                    description: Crawl ID
                  object:
                    type: string
                    description: The kind of object. "crawl" for this endpoint.
                  status:
                    type: string
                    description: '`in_progress` or `completed`'
                  search_query:
                    type: string
                  pages_count:
                    type: number
                  pages:
                    type: array
                    items:
                      type: object
                      properties:
                        id:
                          type: string
                        retrieve_id:
                          type: string
                          description: To fetch content from the `/retrieve` endpoint
                        url:
                          type: string
                        is_external:
                          type: boolean
                        html_content:
                          type: string
                          description: >-
                            Deprecated: Use `/retrieve` endpoint with
                            `retrieve_id`.
                        markdown_content:
                          type: string
                          description: >-
                            Deprecated: Use `/retrieve` endpoint with
                            `retrieve_id`.
                  metadata:
                    type: object
                    properties:
                      external_urls:
                        type: array
                        description: External URLs that were found during crawl
                        items:
                          type: string
                      failed_urls:
                        type: array
                        description: URLs that were found but couldn't be scraped
                        items:
                          type: string
                  cursor:
                    type: integer
                    description: >-
                      To be passed in the query in next request to get the next
                      items.
        '400':
          description: Bad request due to incorrect or missing parameters.
        '404':
          description: Crawl not found for the provided ID.
        '500':
          description: Internal server error.
      security:
        - Authorization: []
components:
  securitySchemes:
    Authorization:
      type: http
      scheme: bearer
      description: >-
        Bearer authentication header of the form Bearer <token>, where <token>
        is your auth token.

````