Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.olostep.com/llms.txt

Use this file to discover all available pages before exploring further.

The Olostep /v1/searches endpoint lets you search the web with a natural language query and get back a deduplicated list of relevant links with titles and descriptions.
  • Send a query in plain English
  • Get back structured links from across the web
  • Optionally scrape every returned URL in one round-trip and embed markdown_content / html_content directly into the response
  • Filter by domain, control the result count, and bound the scraping wallclock
It will search for the query semantically across the web and return results. For API details, see the Search Endpoint API Reference.

Installation

pip install olostep

Basic usage

Send a natural language query and receive a list of relevant links.
from olostep import Olostep

client = Olostep(api_key="YOUR_REAL_KEY")

search = client.searches.create("Best Answer Engine Optimization startups")

print(search.id, len(search.links))

Request parameters

FieldTypeRequiredDefaultDescription
querystringyesThe search query in natural language.
limitintegerno12Maximum number of links to return after deduplication. Must be between 1 and 25.
include_domainsstring[]no[]Restrict results to these domains. Bare hosts only — leading http(s):// and trailing slashes are stripped automatically.
exclude_domainsstring[]no[]Exclude results from these domains. Bare hosts only — leading http(s):// and trailing slashes are stripped automatically.
scrape_optionsobjectnoWhen provided, every returned link is also scraped and its content embedded in the response. See scrape_options below.

Limiting the number of results

{
  "query": "What's going on with OpenAI's Sora shutting down?",
  "limit": 5
}

Filtering by domain

include_domains narrows results to a whitelist; exclude_domains filters out unwanted sources. They can be combined.
{
  "query": "OpenAI Sora shutdown analysis",
  "include_domains": ["nytimes.com", "wsj.com", "bbc.com"],
  "exclude_domains": ["pinterest.com"]
}

scrape_options

Pass scrape_options to scrape every returned URL in parallel and embed the rendered content directly on each link. This saves a round-trip per result vs. calling /v1/searches and /v1/scrapes separately.
{
  "query": "What's going on with OpenAI's Sora shutting down?",
  "limit": 10,
  "scrape_options": {
    "formats": ["markdown"],
    "remove_css_selectors": "default",
    "timeout": 25
  }
}
FieldTypeDefaultDescription
formatsstring[]["markdown"]Output formats to attach to each link. For /v1/searches, only "html" and "markdown" are supported. Pass ["html", "markdown"] to receive both.
remove_css_selectorsstring"default"Forwarded to /v1/scrapes. "default" strips nav/footer/script/style/svg/dialog noise. Use "none" to disable, or pass a JSON-stringified array of selectors to remove.
timeoutinteger25Wallclock budget in seconds for the entire scrape phase. Must be between 1 and 60. After this elapses, the search returns immediately — content fields will be null for any links that hadn’t finished.

Behavior

  • All links are scraped in parallel. The timeout bounds the whole batch, not each individual link.
  • Per-link scrape failures (network errors, individual page timeouts) leave that link’s markdown_content / html_content as null while other links return normally.
  • If the global timeout elapses before all scrapes finish, the search responds immediately with the links it has — already-completed scrapes keep their content; in-flight ones come back with null content.
  • For reddit.com/.../comments/... URLs, the request is automatically routed through the @olostep/reddit-post parser and the structured JSON is rendered into clean markdown + basic HTML
  • If the combined inline content exceeds 9MB, content fields are nulled, result.size_exceeded is set to true, and you can fetch the full payload from result.json_hosted_url.

Example with scraping

from olostep import Olostep

client = Olostep(api_key="YOUR_REAL_KEY")

search = client.searches.create(
    query="What's going on with OpenAI's Sora shutting down?",
    limit=5,
    scrape_options={"formats": ["markdown"], "timeout": 25},
)

for link in search.links:
    print(link["url"], "—", len(link.get("markdown_content") or ""), "chars")

Response

You will receive a search object in response. The search object contains an id, your original query, credits_consumed, and a result with a list of links.
{
  "id": "search_9bi0sbj9xa",
  "object": "search",
  "created": 1760327323,
  "metadata": {},
  "query": "What's going on with OpenAI's Sora shutting down?",
  "credits_consumed": 10,
  "result": {
    "json_content": "...",
    "json_hosted_url": "https://olostep-storage.s3.us-east-1.amazonaws.com/search_9bi0sbj9xa.json",
    "size_exceeded": false,
    "credits_consumed": 10,
    "links": [
      {
        "url": "https://www.bbc.com/news/articles/c3w3e467ewqo",
        "title": "OpenAI to shut down Sora video platform",
        "description": "OpenAI says it will discontinue its Sora app...",
        "markdown_content": "# OpenAI to shut down Sora video platform\n\nOpenAI says it will discontinue..."
      },
      {
        "url": "https://www.reddit.com/r/OutOfTheLoop/comments/1s2u847/whats_going_on_with_openais_sora_shutting_down/",
        "title": "What's going on with OpenAI's Sora shutting down?",
        "description": "Reddit thread discussing the shutdown.",
        "markdown_content": "# What's going on with OpenAI's Sora shutting down?\n\n*r/OutOfTheLoop · u/rm-minus-r · 1mo ago*\n\n..."
      }
    ]
  }
}
Each link in result.links contains:
FieldTypeDescription
urlstringThe URL of the search result.
titlestringThe title of the result page.
descriptionstringA short snippet describing the result.
markdown_contentstringMarkdown content of the page. Only present when scrape_options.formats includes "markdown". null if the scrape failed, was empty, or hit the global timeout.
html_contentstringHTML content of the page. Only present when scrape_options.formats includes "html". null on failure/timeout.
The full result is also available as a hosted JSON file at result.json_hosted_url — useful when result.size_exceeded is true. GET /v1/searches/{search_id} returns whatever was persisted at search time, including any scraped content. It’s a pure idempotent read — no re-scraping, no re-billing. Older searches without scrape_options simply have no per-link content fields.
from olostep import Olostep

client = Olostep(api_key="YOUR_REAL_KEY")

search = client.searches.get(search_id="search_9bi0sbj9xa")
print(search.id, len(search.links))
See Get Search for full details.

Pricing

Each search costs 5 credits for the search itself. When scrape_options is provided, each scraped page is billed at the standard /v1/scrapes rate (typically 1 credit per page; some parsers cost more). The total is returned in credits_consumed. Examples:
Requestcredits_consumed
Search only5
Search + 5 scraped pages (1 credit each)10