Olostep NodeJS SDK - Olostep Docs

NPMパッケージ: olostep

はじめに

npm install olostep

import Olostep from 'olostep';

const client = new Olostep({apiKey: process.env.OLOSTEP_API_KEY});

// 最小限のスクレイピング例
const result = await client.scrapes.create('https://example.com');
console.log(result.id, result.html_content);

NodeJS SDKはすべてのパラメータに対してcamelCaseとsnake_caseの両方を受け入れます。AIエージェント用に構築する場合は、APIのネイティブフィールド名に一致するsnake_caseを使用してください。

使用法

スクレイピング

さまざまなオプションで単一のURLをスクレイピングします:

import Olostep, {Format} from 'olostep';

const client = new Olostep({apiKey: 'your_api_key'});

// シンプルなスクレイピング
const scrape = await client.scrapes.create('https://example.com');

// 複数のフォーマットで
const scrape = await client.scrapes.create({
  url: 'https://example.com',
  formats: [Format.HTML, Format.MARKDOWN, Format.TEXT],
  waitBeforeScraping: 1000,
  removeImages: true
});

// コンテンツにアクセス
console.log(scrape.html_content);
console.log(scrape.markdown_content);

// IDでスクレイピングを取得
const fetched = await client.scrapes.get(scrape.id);

バッチ処理

複数のURLを一度に処理します:

// URL文字列を使用（カスタムIDは自動生成）
const batch = await client.batches.create([
  'https://example.com',
  'https://example.org',
  'https://example.net'
]);

// または明示的なカスタムIDで
const batch = await client.batches.create([
  {url: 'https://example.com', customId: 'site-1'},
  {url: 'https://example.org', customId: 'site-2'}
]);

console.log(`Batch ${batch.id} created with ${batch.total_urls} URLs`);

// 完了を待つ
await batch.waitTillDone({
  checkEveryNSecs: 5,
  timeoutSeconds: 120
});

// バッチ情報を取得
const info = await batch.info();
console.log(info);

// 個々の結果をストリーム
for await (const item of batch.items()) {
  console.log(item.custom_id);
}

クロール

ウェブサイト全体をクロールします:

const crawl = await client.crawls.create({
  url: 'https://example.com',
  maxPages: 100,
  maxDepth: 3,
  includeUrls: ['*/blog/*'],
  excludeUrls: ['*/admin/*']
});

console.log(`Crawl ${crawl.id} started`);

// 完了を待つ
await crawl.waitTillDone({
  checkEveryNSecs: 10,
  timeoutSeconds: 300
});

// クロール情報を取得
const info = await crawl.info();
console.log(`Crawled ${info.pages_crawled} pages`);

// クロールされたページをストリーム
for await (const page of crawl.pages()) {
  console.log(page.url, page.status_code);
}

サイトマッピング

ウェブサイトからURLのサイトマップを生成します:

const map = await client.maps.create({
  url: 'https://example.com',
  topN: 100,
  includeSubdomain: true,
  searchQuery: 'blog posts'
});

console.log(`Map ${map.id} created`);

// URLをストリーム
for await (const url of map.urls()) {
  console.log(url);
}

// マップ情報を取得
const info = await map.info();

AI駆動の回答

AIを使用してウェブページから回答を取得します:

import Olostep from 'olostep';

const client = new Olostep({apiKey: 'your_api_key'});

// シンプルなタスク: 文字列を直接渡す
const answer = await client.answers.create(
  'What is the main topic of https://example.com?'
);
console.log(answer.answer);
console.log(answer.sources);

// 構造化されたJSON出力で
const structured = await client.answers.create({
  task: 'Extract all product names and prices from https://example.com',
  jsonFormat: {
    products: [{name: '', price: ''}]
  }
});
console.log(structured.json_content);

// 以前に作成された回答をIDで取得
const fetched = await client.answers.get(answer.id);
console.log(fetched.answer);

コンテンツの取得

以前にスクレイピングされたコンテンツを取得します:

// 特定のフォーマットでコンテンツを取得
const content = await client.retrieve(retrieveId, Format.MARKDOWN);
console.log(content.markdown_content);

// 複数のフォーマット
const content = await client.retrieve(retrieveId, [
  Format.HTML,
  Format.MARKDOWN
]);

高度なオプション

カスタムアクション

スクレイピング前にブラウザアクションを実行します:

const scrape = await client.scrapes.create({
  url: 'https://example.com',
  actions: [
    {type: 'wait', milliseconds: 2000},
    {type: 'click', selector: '#load-more'},
    {type: 'scroll', distance: 1000},
    {type: 'fill_input', selector: '#search', value: 'query'}
  ]
});

地理的ロケーション

事前定義された国コードまたは有効な国コード文字列を使用して異なる国からスクレイピングします:

import Olostep, {Country} from 'olostep';

const client = new Olostep({apiKey: 'your_api_key'});

// 事前定義された列挙値を使用（US, DE, FR, GB, SG）
const scrape = await client.scrapes.create({
  url: 'https://example.com',
  country: Country.DE  // ドイツ
});

// または有効な国コードを文字列として使用
const scrape2 = await client.scrapes.create({
  url: 'https://example.com',
  country: 'jp'  // 日本
});

LLM抽出

LLMを使用して構造化データを抽出します:

const scrape = await client.scrapes.create({
  url: 'https://example.com',
  llmExtract: {
    schema: {
      title: 'string',
      price: 'number',
      description: 'string'
    },
    prompt: 'Extract product information from this page'
  }
});

クライアント設定

import Olostep from 'olostep';

const client = new Olostep({
  apiKey: 'your_api_key',
  apiBaseUrl: 'https://api.olostep.com/v1',  // オプション
  timeoutMs: 150000,  // 150秒（オプション）
  retry: {
    maxRetries: 3,
    initialDelayMs: 1000
  },
  userAgent: 'MyApp/1.0'  // オプション
});

機能のハイライト

フルTypeScriptサポートの非同期ファーストクライアント。
TypeScriptの列挙型とインターフェース（Formats, Countries, Actionsなど）を使用した型安全な入力。
リッチなリソースネームスペース、短縮呼び出し（client.scrapes.create()）と明示的なメソッド（client.scrapes.get()）の両方。
リトライ、タイムアウト、JSONデコードを備えた共有トランスポートレイヤー。
包括的なエラーハイアラキー。

SDKs

Documentation Index

​はじめに

​使用法

​スクレイピング

​バッチ処理

​クロール

​サイトマッピング

​AI駆動の回答

​コンテンツの取得

​高度なオプション

​カスタムアクション

​地理的ロケーション

​LLM抽出

​クライアント設定

​機能のハイライト

はじめに

使用法

スクレイピング

バッチ処理

クロール

サイトマッピング

AI駆動の回答

コンテンツの取得

高度なオプション

カスタムアクション

地理的ロケーション

LLM抽出

クライアント設定

機能のハイライト