搜索 API

Olostep 的 /v1/searches 端点允许你使用自然语言查询搜索网络，并返回一个去重的相关链接列表，包含标题和描述。

发送普通英语查询
从整个网络中获取结构化链接
可选地在一次往返中抓取每个返回的 URL，并将 markdown_content / html_content 直接嵌入响应中
按域名过滤，控制结果数量，并限制抓取的时间预算

它将语义地在网络上搜索查询并返回结果。有关 API 详情，请参阅搜索端点 API 参考。

安装

pip install olostep

基本用法

发送自然语言查询并接收相关链接列表。

from olostep import Olostep

client = Olostep(api_key="YOUR_REAL_KEY")

search = client.searches.create("Best Answer Engine Optimization startups")

print(search.id, len(search.links))

请求参数

字段	类型	必需	默认值	描述
`query`	string	是	—	自然语言的搜索查询。
`limit`	integer	否	`12`	去重后返回的最大链接数量。必须在 `1` 到 `25` 之间。
`include_domains`	string[]	否	`[]`	将结果限制在这些域名。仅裸主机——前导 `http(s)://` 和尾随斜杠会自动去除。
`exclude_domains`	string[]	否	`[]`	排除来自这些域名的结果。仅裸主机——前导 `http(s)://` 和尾随斜杠会自动去除。
`scrape_options`	object	否	—	提供时，每个返回的链接也会被抓取，其内容嵌入到响应中。参见下文的 scrape_options。

限制结果数量

{
  "query": "What's going on with OpenAI's Sora shutting down?",
  "limit": 5
}

按域名过滤

include_domains 将结果缩小到白名单；exclude_domains 过滤掉不需要的来源。它们可以组合使用。

{
  "query": "OpenAI Sora shutdown analysis",
  "include_domains": ["nytimes.com", "wsj.com", "bbc.com"],
  "exclude_domains": ["pinterest.com"]
}

scrape_options

传递 scrape_options 以并行抓取每个返回的 URL，并将渲染的内容直接嵌入到每个链接中。这节省了每个结果的往返时间，而不是分别调用 /v1/searches 和 /v1/scrapes。

{
  "query": "What's going on with OpenAI's Sora shutting down?",
  "limit": 10,
  "scrape_options": {
    "formats": ["markdown"],
    "remove_css_selectors": "default",
    "timeout": 25
  }
}

字段	类型	默认值	描述
`formats`	string[]	`["markdown"]`	附加到每个链接的输出格式。对于 `/v1/searches`，仅支持 `"html"` 和 `"markdown"`。传递 `["html", "markdown"]` 以接收两者。
`remove_css_selectors`	string	`"default"`	转发到 `/v1/scrapes`。`"default"` 去除导航/页脚/脚本/样式/svg/对话框噪音。使用 `"none"` 禁用，或传递 JSON 字符串化的选择器数组以移除。
`timeout`	integer	`25`	整个抓取阶段的时间预算（以秒为单位）。必须在 `1` 到 `60` 之间。超时后，搜索立即返回——任何未完成的链接的内容字段将为 `null`。

行为

所有链接并行抓取。timeout 限制整个批次，而不是每个单独的链接。
每个链接的抓取失败（网络错误，单个页面超时）将使该链接的 markdown_content / html_content 为 null，而其他链接正常返回。
如果全局 timeout 在所有抓取完成之前到期，搜索立即响应已完成的链接——已完成的抓取保留其内容；进行中的抓取返回 null 内容。
对于 reddit.com/.../comments/... URL，请求会自动通过 @olostep/reddit-post 解析器路由，并将结构化 JSON 渲染为干净的 markdown + 基本 HTML。
如果合并的内联内容超过 9MB，内容字段将为 null，result.size_exceeded 设置为 true，你可以从 result.json_hosted_url 获取完整的负载。

带抓取的示例

from olostep import Olostep

client = Olostep(api_key="YOUR_REAL_KEY")

search = client.searches.create(
    query="What's going on with OpenAI's Sora shutting down?",
    limit=5,
    scrape_options={"formats": ["markdown"], "timeout": 25},
)

for link in search.links:
    print(link["url"], "—", len(link.get("markdown_content") or ""), "chars")

响应

你将收到一个 search 对象作为响应。search 对象包含一个 id，你的原始 query，credits_consumed，以及包含 links 列表的 result。

{
  "id": "search_9bi0sbj9xa",
  "object": "search",
  "created": 1760327323,
  "metadata": {},
  "query": "What's going on with OpenAI's Sora shutting down?",
  "credits_consumed": 10,
  "result": {
    "json_content": "...",
    "json_hosted_url": "https://olostep-storage.s3.us-east-1.amazonaws.com/search_9bi0sbj9xa.json",
    "size_exceeded": false,
    "credits_consumed": 10,
    "links": [
      {
        "url": "https://www.bbc.com/news/articles/c3w3e467ewqo",
        "title": "OpenAI to shut down Sora video platform",
        "description": "OpenAI says it will discontinue its Sora app...",
        "markdown_content": "# OpenAI to shut down Sora video platform\n\nOpenAI says it will discontinue..."
      },
      {
        "url": "https://www.reddit.com/r/OutOfTheLoop/comments/1s2u847/whats_going_on_with_openais_sora_shutting_down/",
        "title": "What's going on with OpenAI's Sora shutting down?",
        "description": "Reddit thread discussing the shutdown.",
        "markdown_content": "# What's going on with OpenAI's Sora shutting down?\n\n*r/OutOfTheLoop · u/rm-minus-r · 1mo ago*\n\n..."
      }
    ]
  }
}

result.links 中的每个链接包含：

字段	类型	描述
`url`	string	搜索结果的 URL。
`title`	string	结果页面的标题。
`description`	string	描述结果的简短片段。
`markdown_content`	string	页面的 Markdown 内容。仅当 `scrape_options.formats` 包含 `"markdown"` 时存在。抓取失败、为空或超时时为 `null`。
`html_content`	string	页面的 HTML 内容。仅当 `scrape_options.formats` 包含 `"html"` 时存在。失败/超时时为 `null`。

完整结果也可以作为托管的 JSON 文件在 result.json_hosted_url 获取——当 result.size_exceeded 为 true 时很有用。

检索过去的搜索

GET /v1/searches/{search_id} 返回在搜索时保存的内容，包括任何抓取的内容。这是一个纯粹的幂等读取——没有重新抓取，没有重新计费。没有 scrape_options 的旧搜索只是没有每个链接的内容字段。

from olostep import Olostep

client = Olostep(api_key="YOUR_REAL_KEY")

search = client.searches.get(search_id="search_9bi0sbj9xa")
print(search.id, len(search.links))

有关完整详情，请参阅获取搜索。

定价

每次搜索费用为 5 个积分。当提供 scrape_options 时，每个抓取的页面按标准 /v1/scrapes 费率计费（通常每页 1 个积分；某些解析器费用更高）。总费用在 credits_consumed 中返回。示例：

请求	`credits_consumed`
仅搜索	`5`
搜索 + 5 个抓取页面（每个 1 个积分）	`10`

开始使用

功能

集成

安装

基本用法

请求参数

限制结果数量

按域名过滤

scrape_options

行为

带抓取的示例

响应

检索过去的搜索

定价

开始使用

功能

集成

Documentation Index

​安装

​基本用法

​请求参数

​限制结果数量

​按域名过滤

​scrape_options

​行为

​带抓取的示例

​响应

​检索过去的搜索

​定价

安装

基本用法

请求参数

限制结果数量

按域名过滤

scrape_options

行为

带抓取的示例

响应

检索过去的搜索

定价