Search API

The Olostep /v1/searches endpoint lets you search the web with a natural language query and get back a deduplicated list of relevant links with titles and descriptions.

Send a query in plain English
Get back structured links from across the web
Optionally scrape every returned URL in one round-trip and embed markdown_content / html_content directly into the response
Filter by domain, control the result count, and bound the scraping wallclock

It will search for the query semantically across the web and return results. For API details, see the Search Endpoint API Reference.

Installation

pip install olostep

Basic usage

Send a natural language query and receive a list of relevant links.

from olostep import Olostep

client = Olostep(api_key="YOUR_REAL_KEY")

search = client.searches.create("Best Answer Engine Optimization startups")

print(search.id, len(search.links))

Request parameters

Field	Type	Required	Default	Description
`query`	string	yes	—	The search query in natural language.
`limit`	integer	no	`12`	Maximum number of links to return after deduplication. Must be between `1` and `25`.
`include_domains`	string[]	no	`[]`	Restrict results to these domains. Bare hosts only — leading `http(s)://` and trailing slashes are stripped automatically.
`exclude_domains`	string[]	no	`[]`	Exclude results from these domains. Bare hosts only — leading `http(s)://` and trailing slashes are stripped automatically.
`scrape_options`	object	no	—	When provided, every returned link is also scraped and its content embedded in the response. See scrape_options below.

Limiting the number of results

{
  "query": "What's going on with OpenAI's Sora shutting down?",
  "limit": 5
}

Filtering by domain

include_domains narrows results to a whitelist; exclude_domains filters out unwanted sources. They can be combined.

{
  "query": "OpenAI Sora shutdown analysis",
  "include_domains": ["nytimes.com", "wsj.com", "bbc.com"],
  "exclude_domains": ["pinterest.com"]
}

scrape_options

Pass scrape_options to scrape every returned URL in parallel and embed the rendered content directly on each link. This saves a round-trip per result vs. calling /v1/searches and /v1/scrapes separately.

{
  "query": "What's going on with OpenAI's Sora shutting down?",
  "limit": 10,
  "scrape_options": {
    "formats": ["markdown"],
    "remove_css_selectors": "default",
    "timeout": 25
  }
}

Field	Type	Default	Description
`formats`	string[]	`["markdown"]`	Output formats to attach to each link. For `/v1/searches`, only `"html"` and `"markdown"` are supported. Pass `["html", "markdown"]` to receive both.
`remove_css_selectors`	string	`"default"`	Forwarded to `/v1/scrapes`. `"default"` strips nav/footer/script/style/svg/dialog noise. Use `"none"` to disable, or pass a JSON-stringified array of selectors to remove.
`timeout`	integer	`25`	Wallclock budget in seconds for the entire scrape phase. Must be between `1` and `60`. After this elapses, the search returns immediately — content fields will be `null` for any links that hadn’t finished.

Behavior

All links are scraped in parallel. The timeout bounds the whole batch, not each individual link.
Per-link scrape failures (network errors, individual page timeouts) leave that link’s markdown_content / html_content as null while other links return normally.
If the global timeout elapses before all scrapes finish, the search responds immediately with the links it has — already-completed scrapes keep their content; in-flight ones come back with null content.
For reddit.com/.../comments/... URLs, the request is automatically routed through the @olostep/reddit-post parser and the structured JSON is rendered into clean markdown + basic HTML
If the combined inline content exceeds 9MB, content fields are nulled, result.size_exceeded is set to true, and you can fetch the full payload from result.json_hosted_url.

Example with scraping

from olostep import Olostep

client = Olostep(api_key="YOUR_REAL_KEY")

search = client.searches.create(
    query="What's going on with OpenAI's Sora shutting down?",
    limit=5,
    scrape_options={"formats": ["markdown"], "timeout": 25},
)

for link in search.links:
    print(link["url"], "—", len(link.get("markdown_content") or ""), "chars")

Response

You will receive a search object in response. The search object contains an id, your original query, credits_consumed, and a result with a list of links.

{
  "id": "search_9bi0sbj9xa",
  "object": "search",
  "created": 1760327323,
  "metadata": {},
  "query": "What's going on with OpenAI's Sora shutting down?",
  "credits_consumed": 10,
  "result": {
    "json_content": "...",
    "json_hosted_url": "https://olostep-storage.s3.us-east-1.amazonaws.com/search_9bi0sbj9xa.json",
    "size_exceeded": false,
    "credits_consumed": 10,
    "links": [
      {
        "url": "https://www.bbc.com/news/articles/c3w3e467ewqo",
        "title": "OpenAI to shut down Sora video platform",
        "description": "OpenAI says it will discontinue its Sora app...",
        "markdown_content": "# OpenAI to shut down Sora video platform\n\nOpenAI says it will discontinue..."
      },
      {
        "url": "https://www.reddit.com/r/OutOfTheLoop/comments/1s2u847/whats_going_on_with_openais_sora_shutting_down/",
        "title": "What's going on with OpenAI's Sora shutting down?",
        "description": "Reddit thread discussing the shutdown.",
        "markdown_content": "# What's going on with OpenAI's Sora shutting down?\n\n*r/OutOfTheLoop · u/rm-minus-r · 1mo ago*\n\n..."
      }
    ]
  }
}

Each link in result.links contains:

Field	Type	Description
`url`	string	The URL of the search result.
`title`	string	The title of the result page.
`description`	string	A short snippet describing the result.
`markdown_content`	string	Markdown content of the page. Only present when `scrape_options.formats` includes `"markdown"`. `null` if the scrape failed, was empty, or hit the global timeout.
`html_content`	string	HTML content of the page. Only present when `scrape_options.formats` includes `"html"`. `null` on failure/timeout.

The full result is also available as a hosted JSON file at result.json_hosted_url — useful when result.size_exceeded is true.

Retrieving a past search

GET /v1/searches/{search_id} returns whatever was persisted at search time, including any scraped content. It’s a pure idempotent read — no re-scraping, no re-billing. Older searches without scrape_options simply have no per-link content fields.

from olostep import Olostep

client = Olostep(api_key="YOUR_REAL_KEY")

search = client.searches.get(search_id="search_9bi0sbj9xa")
print(search.id, len(search.links))

See Get Search for full details.

Pricing

Each search costs 5 credits for the search itself. When scrape_options is provided, each scraped page is billed at the standard /v1/scrapes rate (typically 1 credit per page; some parsers cost more). The total is returned in credits_consumed. Examples:

Request	`credits_consumed`
Search only	`5`
Search + 5 scraped pages (1 credit each)	`10`

Get Started

Features

Integrations

Installation

Basic usage

Request parameters

Limiting the number of results

Filtering by domain

scrape_options

Behavior

Example with scraping

Response

Retrieving a past search

Pricing

Get Started

Features

Integrations

Documentation Index

​Installation

​Basic usage

​Request parameters

​Limiting the number of results

​Filtering by domain

​scrape_options

​Behavior

​Example with scraping

​Response

​Retrieving a past search

​Pricing

Installation

Basic usage

Request parameters

Limiting the number of results

Filtering by domain

scrape_options

Behavior

Example with scraping

Response

Retrieving a past search

Pricing