Olostep NodeJS SDK - Olostep Docs

NPM 包: olostep

入门

npm install olostep

import Olostep from 'olostep';

const client = new Olostep({apiKey: process.env.OLOSTEP_API_KEY});

// 最小抓取示例
const result = await client.scrapes.create('https://example.com');
console.log(result.id, result.html_content);

NodeJS SDK 接受所有参数的 camelCase 和 snake_case 格式。如果你正在为 AI 代理构建，请使用 snake_case，它与 API 的原生字段名称匹配。

使用方法

抓取

使用各种选项抓取单个 URL：

import Olostep, {Format} from 'olostep';

const client = new Olostep({apiKey: 'your_api_key'});

// 简单抓取
const scrape = await client.scrapes.create('https://example.com');

// 使用多种格式
const scrape = await client.scrapes.create({
  url: 'https://example.com',
  formats: [Format.HTML, Format.MARKDOWN, Format.TEXT],
  waitBeforeScraping: 1000,
  removeImages: true
});

// 访问内容
console.log(scrape.html_content);
console.log(scrape.markdown_content);

// 通过 ID 获取抓取结果
const fetched = await client.scrapes.get(scrape.id);

批处理

在单个批处理中处理多个 URL：

// 使用 URL 字符串（自动生成自定义 ID）
const batch = await client.batches.create([
  'https://example.com',
  'https://example.org',
  'https://example.net'
]);

// 或者使用显式自定义 ID
const batch = await client.batches.create([
  {url: 'https://example.com', customId: 'site-1'},
  {url: 'https://example.org', customId: 'site-2'}
]);

console.log(`批处理 ${batch.id} 已创建，包含 ${batch.total_urls} 个 URL`);

// 等待完成
await batch.waitTillDone({
  checkEveryNSecs: 5,
  timeoutSeconds: 120
});

// 获取批处理信息
const info = await batch.info();
console.log(info);

// 流式处理单个结果
for await (const item of batch.items()) {
  console.log(item.custom_id);
}

爬取

爬取整个网站：

const crawl = await client.crawls.create({
  url: 'https://example.com',
  maxPages: 100,
  maxDepth: 3,
  includeUrls: ['*/blog/*'],
  excludeUrls: ['*/admin/*']
});

console.log(`爬取 ${crawl.id} 已开始`);

// 等待完成
await crawl.waitTillDone({
  checkEveryNSecs: 10,
  timeoutSeconds: 300
});

// 获取爬取信息
const info = await crawl.info();
console.log(`已爬取 ${info.pages_crawled} 页`);

// 流式处理爬取的页面
for await (const page of crawl.pages()) {
  console.log(page.url, page.status_code);
}

网站映射

从网站生成 URL 的站点地图：

const map = await client.maps.create({
  url: 'https://example.com',
  topN: 100,
  includeSubdomain: true,
  searchQuery: 'blog posts'
});

console.log(`地图 ${map.id} 已创建`);

// 流式处理 URL
for await (const url of map.urls()) {
  console.log(url);
}

// 获取地图信息
const info = await map.info();

AI 驱动的答案

使用 AI 从网页获取答案：

import Olostep from 'olostep';

const client = new Olostep({apiKey: 'your_api_key'});

// 简单任务：直接传递字符串
const answer = await client.answers.create(
  'What is the main topic of https://example.com?'
);
console.log(answer.answer);
console.log(answer.sources);

// 使用结构化 JSON 输出
const structured = await client.answers.create({
  task: 'Extract all product names and prices from https://example.com',
  jsonFormat: {
    products: [{name: '', price: ''}]
  }
});
console.log(structured.json_content);

// 通过 ID 检索先前创建的答案
const fetched = await client.answers.get(answer.id);
console.log(fetched.answer);

内容检索

检索先前抓取的内容：

// 获取特定格式的内容
const content = await client.retrieve(retrieveId, Format.MARKDOWN);
console.log(content.markdown_content);

// 多种格式
const content = await client.retrieve(retrieveId, [
  Format.HTML,
  Format.MARKDOWN
]);

高级选项

自定义操作

在抓取前执行浏览器操作：

const scrape = await client.scrapes.create({
  url: 'https://example.com',
  actions: [
    {type: 'wait', milliseconds: 2000},
    {type: 'click', selector: '#load-more'},
    {type: 'scroll', distance: 1000},
    {type: 'fill_input', selector: '#search', value: 'query'}
  ]
});

地理位置

使用预定义的国家代码或任何有效的国家代码字符串从不同国家抓取：

import Olostep, {Country} from 'olostep';

const client = new Olostep({apiKey: 'your_api_key'});

// 使用预定义的枚举值（US, DE, FR, GB, SG）
const scrape = await client.scrapes.create({
  url: 'https://example.com',
  country: Country.DE  // 德国
});

// 或使用任何有效的国家代码作为字符串
const scrape2 = await client.scrapes.create({
  url: 'https://example.com',
  country: 'jp'  // 日本
});

LLM 提取

使用 LLM 提取结构化数据：

const scrape = await client.scrapes.create({
  url: 'https://example.com',
  llmExtract: {
    schema: {
      title: 'string',
      price: 'number',
      description: 'string'
    },
    prompt: 'Extract product information from this page'
  }
});

客户端配置

import Olostep from 'olostep';

const client = new Olostep({
  apiKey: 'your_api_key',
  apiBaseUrl: 'https://api.olostep.com/v1',  // 可选
  timeoutMs: 150000,  // 150 秒（可选）
  retry: {
    maxRetries: 3,
    initialDelayMs: 1000
  },
  userAgent: 'MyApp/1.0'  // 可选
});

功能亮点

以异步为主的客户端，全面支持 TypeScript。
使用 TypeScript 枚举和接口进行类型安全输入（Formats、Countries、Actions 等）。
丰富的资源命名空间，提供简写调用（client.scrapes.create()）和显式方法（client.scrapes.get()）。
共享传输层，支持重试、超时和 JSON 解码。
全面的错误层次结构。

​入门

​使用方法

​抓取

​批处理

​爬取

​网站映射

​AI 驱动的答案

​内容检索

​高级选项

​自定义操作

​地理位置

​LLM 提取

​客户端配置

​功能亮点

入门

使用方法

抓取

批处理

爬取

网站映射

AI 驱动的答案

内容检索

高级选项

自定义操作

地理位置

LLM 提取

客户端配置

功能亮点