> ## Documentation Index
> Fetch the complete documentation index at: https://docs.olostep.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Retrieve Content

> Retrieve content of processed batches and crawls urls.



## OpenAPI

````yaml openapi/utility.json GET /v1/retrieve
openapi: 3.0.3
info:
  title: Utility API
  version: 1.0.0
servers:
  - url: https://api.olostep.com
security: []
paths:
  /v1/retrieve:
    get:
      summary: Retrieve page content
      description: Fetches the content of a crawled page using its `retrieve_id`.
      parameters:
        - name: retrieve_id
          in: query
          required: true
          schema:
            type: string
          description: >-
            The ID of the page content to retrieve. Available in the response of
            `/v1/crawls/{crawl_id}/pages`, `/v1/scrapes/{scrape_id}` or
            `/v1/batches/{batch_id}/items` endpoints
        - name: formats
          in: query
          required: false
          schema:
            type: array
            items:
              type: string
              enum:
                - html
                - markdown
                - json
          description: >-
            Optional array to retrieve only specific formats in production. If
            not provided, all formats will be returned.
      responses:
        '200':
          description: Successful response with page content.
          content:
            application/json:
              schema:
                type: object
                properties:
                  html_content:
                    type: string
                    description: HTML content of the page, if requested and available.
                  markdown_content:
                    type: string
                    description: Markdown content of the page, if requested and available.
                  json_content:
                    type: string
                    description: >-
                      JSON content of the page returned from parsers, if
                      requested and available.
                  html_hosted_url:
                    type: string
                    description: S3 bucket URL of html. Expires in 7 days.
                  markdown_hosted_url:
                    type: string
                    description: S3 bucket URL of markdown. Expires in 7 days.
                  json_hosted_url:
                    type: string
                    description: S3 bucket URL of json. Expires in 7 days.
                  size_exceeded:
                    type: boolean
                    description: >-
                      If size of content objects exceeds the 6MB limit. If true,
                      use hosted S3 urls to get content.
        '400':
          description: Bad request due to incorrect or missing parameters.
        '404':
          description: Content not found for the provided `retrieve_id`.
        '500':
          description: Internal server error.
      security:
        - Authorization: []
components:
  securitySchemes:
    Authorization:
      type: http
      scheme: bearer
      description: >-
        Bearer authentication header of the form Bearer <token>, where <token>
        is your auth token.

````