> ## Documentation Index
> Fetch the complete documentation index at: https://docs.olostep.com/llms.txt
> Use this file to discover all available pages before exploring further.

# スクレイプ

> 任意のURLをLLM対応のMarkdown、HTML、スクリーンショット、PDF、または構造化されたJSONに変換します。

Olostepの`/v1/scrapes`エンドポイントを使用すると、リアルタイムで任意のURLからLLM対応のMarkdown、HTML、テキスト、スクリーンショット、または構造化されたJSONを抽出できます。

* クリーンなMarkdown、構造化データ、スクリーンショット、またはHTMLを出力
* [Parsers](/features/structured-content/parsers)や[LLM extraction](/features/structured-content/llm-extraction)を通じてJSONを抽出
* 動的コンテンツを処理: jsでレンダリングされたサイト、アクションを通じたログインフロー、PDF

APIの詳細については、[スクレイプエンドポイントAPIリファレンス](/api-reference/scrapes/create)を参照してください。

## URLをスクレイプする

`/v1/scrapes`エンドポイントを使用して単一のURLをスクレイプし、出力形式を選択します。

### インストール

<CodeGroup>
  ```python Python theme={null}
  pip install olostep
  ```

  ```javascript Node theme={null}
  npm install olostep
  ```

  ```bash cURL theme={null}
  # curlはmacOS、Linux、Windowsでデフォルトで利用可能です
  ```

  ```javascript Node (API) theme={null}
  npm install node-fetch
  ```

  ```bash Python (API) theme={null}
  pip install requests
  ```
</CodeGroup>

### 使用法

エンドポイントを使用して単一のURLをスクレイプし、出力形式を選択できます。必須パラメータは`url_to_scrape`と`formats`です。

他の一般的なパラメータには、`wait_before_scraping`（ミリ秒単位）、`remove_css_selectors`（デフォルト、なし、またはセレクタの配列）、および`country`があります。

<CodeGroup>
  ```python Python theme={null}
  from olostep import Olostep

  client = Olostep(api_key="YOUR_REAL_KEY")

  result = client.scrapes.create(
      url_to_scrape="https://en.wikipedia.org/wiki/Alexander_the_Great",
      formats=["markdown", "html"],
  )

  print(result.markdown_content)
  print(result.html_content)
  ```

  ```js Node theme={null}
  import Olostep from 'olostep'

  const client = new Olostep({ apiKey: 'YOUR_REAL_KEY' })

  const result = await client.scrapes.create({
    url: 'https://en.wikipedia.org/wiki/Alexander_the_Great',
    formats: ['markdown', 'html'],
  })

  console.log(result.markdown_content)
  console.log(result.html_content)
  ```

  ```bash cURL theme={null}
  curl -s -X POST "https://api.olostep.com/v1/scrapes" \
    -H "Authorization: Bearer $OLOSTEP_API_KEY" \
    -H "Content-Type: application/json" \
    -d '{
      "url_to_scrape": "https://en.wikipedia.org/wiki/Alexander_the_Great",
      "formats": ["markdown", "html"]
    }'
  ```

  ```bash CLI theme={null}
  olostep scrape "https://en.wikipedia.org/wiki/Alexander_the_Great" \
    --formats markdown,html
  ```

  ```js Node (API) theme={null}
  const endpoint = 'https://api.olostep.com/v1/scrapes'
  const payload = {
    url_to_scrape: 'https://en.wikipedia.org/wiki/Alexander_the_Great',
    formats: ['markdown', 'html']
  }
  const res = await fetch(endpoint, {
    method: 'POST',
    headers: {
      'Authorization': 'Bearer <YOUR_API_KEY>',
      'Content-Type': 'application/json'
    },
    body: JSON.stringify(payload)
  })
  const data = await res.json()
  console.log(data)
  ```

  ```python Python (API) theme={null}
  import requests
  import json

  endpoint = "https://api.olostep.com/v1/scrapes"
  payload = {
      "url_to_scrape": "https://en.wikipedia.org/wiki/Alexander_the_Great",
      "formats": ["markdown", "html"]
  }
  headers = {
      "Authorization": "Bearer <YOUR_API_KEY>",
      "Content-Type": "application/json"
  }

  response = requests.post(endpoint, json=payload, headers=headers)
  print(json.dumps(response.json(), indent=2))
  ```
</CodeGroup>

### レスポンス

APIはレスポンスとして`scrape`オブジェクトを返します。

`scrape`には`id`や`result`のようなプロパティがあります。

`result`オブジェクトには以下のフィールドがあります（`formats`パラメータに応じて一部はnullになることがあります）:

* `html_content`: ページのHTMLコンテンツ。これを取得するには`formats: ["html"]`を渡します。
* `markdown_content`: ページのMDコンテンツ。これを取得するには`formats: ["markdown"]`を渡します。
* `text_content`: ページのテキストコンテンツ。これを取得するには`formats: ["text"]`を渡します。
* `json_content`: ページのJSONコンテンツ。これを取得するには`formats: ["json"]`を渡し、さらに`parser`または`llm_extract`パラメータを提供します。
* `screenshot_hosted_url`: スクリーンショットのホストされたURL。
* `html_hosted_url`: HTMLコンテンツのホストされたURL
* `markdown_hosted_url`: MarkdownコンテンツのホストされたURL
* `json_hosted_url`: JSONコンテンツのホストされたURL
* `text_hosted_url`: テキストコンテンツのホストされたURL
* `links_on_page`: ページ上のリンク
* `page_metadata`: ページのメタデータ

```json theme={null}
{
  "id": "scrape_6h89o8u1kt",
  "object": "scrape",
  "created": 1745673871,
  "metadata": {},
  "retrieve_id": "6h89o8u1kt",
  "url_to_scrape": "https://en.wikipedia.org/wiki/Alexander_the_Great",
  "result": {
    "html_content": "<html...",
    "markdown_content": "## Alexander the Great...",
    "text_content": null,
    "json_content": null,
    "screenshot_hosted_url": null,
    "html_hosted_url": "https://olostep-storage.s3.us-east-1.amazonaws.com/text_6h89o8u1kt.txt",
    "markdown_hosted_url": "https://olostep-storage.s3.us-east-1.amazonaws.com/markDown_6h89o8u1kt.txt",
    "json_hosted_url": null,
    "text_hosted_url": null,
    "links_on_page": [],
    "page_metadata": { "status_code": 200, "title": "" }
  }
}
```

## キャッシング

速度を最適化するために、OlostepはHTML、Markdown、テキスト、および解析されたJSON結果のためのオプションの共有キャッシュレイヤーを提供します。

### 仕組み

スクレイプが要求されると、Olostepは同じパラメータで一致するスクレイプが既に存在するかどうかを確認します。十分に新しい一致が見つかると、Olostepのストレージから即座にコンテンツが提供され、新しいブラウザスクレイプが開始されません。

* **共有キャッシュ:** キャッシュはグローバルに共有されます。別のリクエストが同じURLを同じ構成でスクレイプした場合、速度の向上を享受できます。
* **後処理はライブ:** `llm_extract`や`links_on_page`フィルタのような操作は、キャッシュされたドキュメントの上で*オンザフライ*で実行されます。構造化された抽出を動的に保ちながら、コアページの取得のみをキャッシュします。

### フレッシュネスと`max_age`

デフォルトでは、プロダクションAPIは常にリアルタイムの精度を保証するためにライブスクレイプを実行します。`max_age`パラメータを使用してキャッシングをオプトインできます。

| パラメータ     | タイプ       | デフォルト | 説明                                                                     |
| --------- | --------- | ----- | ---------------------------------------------------------------------- |
| `max_age` | `integer` | `0`   | **秒**単位での許容コンテンツ年齢。キャッシュされたコピーが存在し、`max_age`秒より新しい場合、それはキャッシュから提供されます。 |

* **デフォルトAPI動作 (`max_age: 0`):** すべてのAPIリクエストは新しいスクレイプをトリガーします。
* **デフォルトプレイグラウンド動作:** ダッシュボードプレイグラウンドでは、`max_age`は24時間（`86400`秒）にデフォルト設定されており、冗長なスクレイプを防ぎ、クレジットを節約しながら構築およびテストを行います。
* **最大年齢:** キャッシュには**7日間**（`604800`秒）のハードリミットがあります。この制限を超える`max_age`が要求された場合、最大7日にフォールバックします。

### 使用例

<CodeGroup>
  ```python Python theme={null}
  from olostep import Olostep

  client = Olostep(api_key="YOUR_REAL_KEY")

  # キャッシングをオプトイン: 最大1日（86400秒）古い結果を受け入れる
  result = client.scrapes.create(
      url_to_scrape="https://example.com",
      formats=["markdown"],
      max_age=86400
  )
  ```

  ```js Node theme={null}
  import Olostep from 'olostep'

  const client = new Olostep({ apiKey: 'YOUR_REAL_KEY' })

  // キャッシングをオプトイン: 最大1日（86400秒）古い結果を受け入れる
  const result = await client.scrapes.create({
    url: 'https://example.com',
    formats: ['markdown'],
    maxAge: 86400,
  })
  ```

  ```bash cURL theme={null}
  # キャッシングをオプトイン: 最大1時間（3600秒）古い結果を受け入れる
  curl -X POST "https://api.olostep.com/v1/scrapes" \
    -H "Authorization: Bearer $OLOSTEP_API_KEY" \
    -H "Content-Type: application/json" \
    -d '{
      "url_to_scrape": "https://example.com",
      "formats": ["markdown"],
      "max_age": 3600
    }'
  ```

  ```js Node (API) theme={null}
  const endpoint = 'https://api.olostep.com/v1/scrapes'
  const payload = {
    url_to_scrape: 'https://example.com',
    formats: ['markdown'],
    max_age: 86400 // 最大1日（86400秒）古い結果を受け入れる
  }
  const res = await fetch(endpoint, {
    method: 'POST',
    headers: {
      'Authorization': 'Bearer <YOUR_API_KEY>',
      'Content-Type': 'application/json'
    },
    body: JSON.stringify(payload)
  })
  const data = await res.json()
  console.log(data)
  ```

  ```python Python (API) theme={null}
  import requests
  import json

  endpoint = "https://api.olostep.com/v1/scrapes"
  payload = {
      "url_to_scrape": "https://example.com",
      "formats": ["markdown"],
      "max_age": 86400 # 最大1日（86400秒）古い結果を受け入れる
  }
  headers = {
      "Authorization": "Bearer <YOUR_API_KEY>",
      "Content-Type": "application/json"
  }

  response = requests.post(endpoint, json=payload, headers=headers)
  print(json.dumps(response.json(), indent=2))
  ```
</CodeGroup>

### いつキャッシュがスキップされるのか？

キャッシュは自動的にバイパスされ（ライブスクレイプを強制）、ユニークなセッション、リアルタイムのビジュアル出力、またはカスタムファイル処理を必要とする機能に対して行われます:

* **インタラクティブセッション:** `session_id`を使用したリクエストやカスタムブラウザ`context`をロードするリクエスト。
* **ビジュアル:** ビジュアライザーツールとスクリーンショット（`htmlVisualizer`）。
* **特別なファイルタイプ:** バイナリファイルのダウンロードや生のPDFレンダリング。
* **デバッグとネットワーク:** `network_calls`のキャプチャや非同期パーサージョブの使用。

## リンクの抽出

リクエストに`links_on_page`オブジェクトを渡して、ページ上のリンクを収集します。すべてのリンクは絶対URLとして返されます。

```json theme={null}
"links_on_page": {
  "include_links": ["/blog/*"],
  "exclude_links": ["*.pdf"],
  "query_to_order_links_by": "pricing"
}
```

* `include_links` / `exclude_links`: 各リンクのURL **パス**に対して一致するグロブパターン。
* `query_to_order_links_by`: このテキストに関連する順に返されたリンクを再順序付けします。

<Note>
  グロブパターンはパスセグメントに一致します。単一の`*`は`/`を越えませんので、`"/blog/*"`は`"/blog/post-1"`に一致しますが、インデックス`"/blog"`自体には一致しません。また、クエリ文字列はパスの一部ではないため、`"/blog?tag=x"`には決して一致しません。インデックスも含めるには、`"/blog*"`または`"{/blog,/blog/**}"`を使用します。
</Note>

## スクレイプ形式

`formats`を介して1つ以上の出力形式を選択します:

* `markdown`: LLM対応のMarkdown
* `html`: クリーンなHTML
* `text`: プレーンテキスト
* `json`: 構造化された出力（パーサーまたはllm\_extractを介して）
* `raw_pdf`: ホストされたURLに抽出された生のPDFバイト
* `screenshot`: アクションを介してスクリーンショットをキャプチャし、ホストされたURLを返す

出力キーは`result`内に`*_content`フィールドとして返され、`*_hosted_url`も返されます。

## 構造化データの抽出

構造化されたJSONを抽出するには、ParsersまたはLLM抽出の2つの方法があります。

### パーサーを使用する（スケールに推奨）

`formats: ["json"]`を定義し、パーサー`id`を提供します。

<CodeGroup>
  ```python Python theme={null}
  from olostep import Olostep

  client = Olostep(api_key="YOUR_REAL_KEY")

  result = client.scrapes.create(
      url_to_scrape="https://www.google.com/search?q=alexander+the+great&gl=us&hl=en",
      formats=["json"],
      parser="@olostep/google-search",
  )

  print(result.json_content)
  ```

  ```js Node theme={null}
  import Olostep from 'olostep'

  const client = new Olostep({ apiKey: 'YOUR_REAL_KEY' })

  const result = await client.scrapes.create({
    url: 'https://www.google.com/search?q=alexander+the+great&gl=us&hl=en',
    formats: ['json'],
    parser: '@olostep/google-search',
  })

  console.log(result.json_content)
  ```

  ```bash cURL theme={null}
  curl -s -X POST "https://api.olostep.com/v1/scrapes" \
    -H "Authorization: Bearer $OLOSTEP_API_KEY" \
    -H "Content-Type: application/json" \
    -d '{
      "url_to_scrape": "https://www.google.com/search?q=alexander+the+great&gl=us&hl=en",
      "formats": ["json"],
      "parser": {"id": "@olostep/google-search"}
    }'
  ```

  ```bash CLI theme={null}
  olostep scrape "https://www.google.com/search?q=alexander+the+great&gl=us&hl=en" \
    --formats json \
    --payload-json '{"parser":{"id":"@olostep/google-search"}}'
  ```

  ```js Node (API) theme={null}
  const res = await fetch('https://api.olostep.com/v1/scrapes', {
    method: 'POST',
    headers: { 'Authorization': 'Bearer <YOUR_API_KEY>', 'Content-Type': 'application/json' },
    body: JSON.stringify({
      url_to_scrape: 'https://www.google.com/search?q=alexander+the+great&gl=us&hl=en',
      formats: ['json'],
      parser: { id: '@olostep/google-search' }
    })
  })
  console.log(await res.json())
  ```

  ```python Python (API) theme={null}
  import requests, json

  endpoint = "https://api.olostep.com/v1/scrapes"
  payload = {
    "url_to_scrape": "https://www.google.com/search?q=alexander+the+great&gl=us&hl=en",
    "formats": ["json"],
    "parser": { 
      "id": "@olostep/google-search" 
    }
  }
  headers = {
      "Authorization": "Bearer <YOUR_API_KEY>", 
      "Content-Type": "application/json"
  }

  res = requests.post(endpoint, json=payload, headers=headers)
  print(json.dumps(res.json(), indent=2))
  ```
</CodeGroup>

Olostepには[人気のあるウェブサイト](https://www.olostep.com/store)のためのいくつかのプリビルドパーサーがありますが、ダッシュボードを通じて独自のパーサーを作成することも、チームに依頼して作成してもらうこともできます。

パーサーは自己修復し、ウェブサイトの最新バージョンに自動的に更新されます。

### LLM抽出を使用する（スキーマおよび/またはプロンプト）

`llm_extract`にJSONスキーマ（`schema`）および/または自然言語の指示（`prompt`）を提供します。両方のパラメータを渡すことができますが、両方が提供された場合、`schema`が優先されます。

代わりに、`prompt`だけを渡すと、LLMはプロンプトに基づいてデータを抽出し、データ構造を独自に決定します。

<CodeGroup>
  ```python Python theme={null}
  from olostep import LLMExtract, Olostep

  client = Olostep(api_key="YOUR_REAL_KEY")

  result = client.scrapes.create(
      url_to_scrape="https://www.berklee.edu/events/stefano-marchese-friends",
      formats=["markdown", "json"],
      llm_extract=LLMExtract(
          schema={
              "event": {
                  "type": "object",
                  "properties": {
                      "title": {"type": "string"},
                      "date": {"type": "string"},
                      "description": {"type": "string"},
                      "venue": {"type": "string"},
                      "address": {"type": "string"},
                      "start_time": {"type": "string"},
                  },
              }
          }
      ),
  )

  print(result.json_content)
  ```

  ```js Node theme={null}
  import Olostep from 'olostep'

  const client = new Olostep({ apiKey: 'YOUR_REAL_KEY' })

  const result = await client.scrapes.create({
    url: 'https://www.berklee.edu/events/stefano-marchese-friends',
    formats: ['markdown', 'json'],
    llmExtract: {
      schema: {
        event: {
          type: 'object',
          properties: {
            title: { type: 'string' },
            date: { type: 'string' },
            description: { type: 'string' },
            venue: { type: 'string' },
            address: { type: 'string' },
            start_time: { type: 'string' },
          },
        },
      },
    },
  })

  console.log(result.json_content)
  ```

  ```bash cURL theme={null}
  curl -s -X POST "https://api.olostep.com/v1/scrapes" \
    -H "Authorization: Bearer $OLOSTEP_API_KEY" \
    -H "Content-Type: application/json" \
    -d '{
      "url_to_scrape": "https://www.berklee.edu/events/stefano-marchese-friends",
      "formats": ["json"],
      "llm_extract": {
        "prompt": "Extract the event title, date, description, venue, address, and start time from the page."
      }
    }'
  ```

  ```bash CLI theme={null}
  olostep scrape "https://www.berklee.edu/events/stefano-marchese-friends" \
    --formats json \
    --payload-json '{"llm_extract":{"prompt":"Extract the event title, date, description, venue, address, and start time from the page."}}'
  ```

  ```js Node (API) theme={null}
  const res = await fetch('https://api.olostep.com/v1/scrapes', {
    method: 'POST',
    headers: { 'Authorization': 'Bearer <YOUR_API_KEY>', 'Content-Type': 'application/json' },
    body: JSON.stringify({
      url_to_scrape: 'https://www.berklee.edu/events/stefano-marchese-friends',
      formats: ['json'],
      llm_extract: {
        prompt: 'Extract the event title, date, description, venue, address, and start time from the page.'
      }
    })
  })
  console.log(await res.json())
  ```

  ```python Python (API) theme={null}
  import requests, json

  endpoint = "https://api.olostep.com/v1/scrapes"
  payload = {
    "url_to_scrape": "https://www.berklee.edu/events/stefano-marchese-friends",
    "formats": ["markdown", "json"],
    "llm_extract": {
      "schema": {
        "event": {
          "type": "object",
          "properties": {
            "title": {"type": "string"},
            "date": {"type": "string"},
            "description": {"type": "string"},
            "venue": {"type": "string"},
            "address": {"type": "string"},
            "start_time": {"type": "string"}
          }
        }
      }
    }
  }
  headers = {
      "Authorization": "Bearer <YOUR_API_KEY>",
      "Content-Type": "application/json"
  }
  res = requests.post(endpoint, json=payload, headers=headers)
  print(json.dumps(res.json(), indent=2))
  ```
</CodeGroup>

注意: `result.json_content`は文字列化されたJSONを返します。オブジェクトが必要な場合はコード内で解析してください。

**価格:** `llm_extract`は1回のスクレイプにつき10クレジットかかります。コストを下げるために、自分のAPIキーを持ち込むか、使用量に基づく価格設定を有効にできます。アクセスを得るには[info@olostep.com](mailto:info@olostep.com)に連絡してください。

## ページ上のリンクを抽出する

`links_on_page`オプションを使用して、スクレイプするページ上のすべてのリンクを抽出できます。抽出されたリンクをフィルタリングおよび順序付けするための以下のパラメータを受け入れます:

* `absolute_links` (boolean, デフォルト: `true`): trueの場合、相対パス（例: `/page`）ではなく完全なURL（例: `https://example.com/page`）を返します。
* `query_to_order_links_by` (string): 提供されたクエリテキストとの類似性によって返されたリンクを順序付けし、最も関連性の高い一致を優先します。
* `include_links` (文字列の配列): グロブパターンを使用して抽出されたリンクをフィルタリングします。`*.pdf`のようなパターンを使用してファイル拡張子を一致させたり、`/blog/*`で特定のパスを一致させたり、`https://example.com/*`のような完全なURLを使用します。ワイルドカード（`*`）、文字クラス（`[a-z]`）、および交互（`{pattern1,pattern2}`）をサポートします。
* `exclude_links` (文字列の配列): `include_links`と同じ構文を使用して特定のリンクを除外します。

## アクションを使用してページと対話する

動的なサイトと対話するためにスクレイプ前にアクションを実行します。サポートされているアクション:

* `wait`と`milliseconds`
* `click`と`selector`
* `fill_input`と`selector`および`value`
* `scroll`と`direction`および`amount`

ページの読み込みを許可するために、他のアクションの前後に`wait`を使用することがよくあります。

### 例

<CodeGroup>
  ```python Python theme={null}
  from olostep import FillInputAction, Olostep, WaitAction

  client = Olostep(api_key="YOUR_REAL_KEY")

  result = client.scrapes.create(
      url_to_scrape="https://example.com/login",
      formats=["markdown"],
      actions=[
          FillInputAction(selector="input[type=email]", value="john@example.com"),
          WaitAction(milliseconds=500),
          FillInputAction(selector="input[type=password]", value="secret"),
          {"type": "click", "selector": "button[type=\"submit\"]"},
          WaitAction(milliseconds=1500),
      ],
  )

  print(result.markdown_content)
  ```

  ```js Node theme={null}
  import Olostep from 'olostep'

  const client = new Olostep({ apiKey: 'YOUR_REAL_KEY' })

  const result = await client.scrapes.create({
    url: 'https://example.com/login',
    formats: ['markdown'],
    actions: [
      { type: 'fill_input', selector: 'input[type=email]', value: 'john@example.com' },
      { type: 'wait', milliseconds: 500 },
      { type: 'fill_input', selector: 'input[type=password]', value: 'secret' },
      { type: 'click', selector: 'button[type="submit"]' },
      { type: 'wait', milliseconds: 1500 },
    ],
  })

  console.log(result.markdown_content)
  ```

  ```bash cURL theme={null}
  curl -s -X POST "https://api.olostep.com/v1/scrapes" \
    -H "Authorization: Bearer $OLOSTEP_API_KEY" \
    -H "Content-Type: application/json" \
    -d '{
      "url_to_scrape": "https://example.com/login",
      "formats": ["markdown"],
      "actions": [
        { "type": "fill_input", "selector": "input[type=email]", "value": "john@example.com" },
        { "type": "wait", "milliseconds": 500 },
        { "type": "fill_input", "selector": "input[type=password]", "value": "secret" },
        { "type": "click", "selector": "button[type=\"submit\"]" },
        { "type": "wait", "milliseconds": 1500 }
      ]
    }'
  ```

  ```bash CLI theme={null}
  # アクションのような複雑なオプションには、--payload-fileを使用してJSONファイルを使用します
  olostep scrape "https://example.com/login" \
    --formats markdown \
    --payload-file actions.json

  # actions.jsonには次の内容が含まれています:
  # {
  #   "actions": [
  #     {"type": "fill_input", "selector": "input[type=email]", "value": "john@example.com"},
  #     {"type": "wait", "milliseconds": 500},
  #     {"type": "fill_input", "selector": "input[type=password]", "value": "secret"},
  #     {"type": "click", "selector": "button[type=\"submit\"]"},
  #     {"type": "wait", "milliseconds": 1500}
  #   ]
  # }
  ```

  ```js Node (API) theme={null}
  const res = await fetch('https://api.olostep.com/v1/scrapes', {
    method: 'POST',
    headers: { 'Authorization': 'Bearer <YOUR_API_KEY>', 'Content-Type': 'application/json' },
    body: JSON.stringify({
      url_to_scrape: 'https://example.com/login',
      formats: ['markdown'],
      actions: [
        { type: 'fill_input', selector: 'input[type=email]', value: 'john@example.com' },
        { type: 'wait', milliseconds: 500 },
        { type: 'fill_input', selector: 'input[type=password]', value: 'secret' },
        { type: 'click', selector: 'button[type="submit"]' },
        { type: 'wait', milliseconds: 1500 }
      ]
    })
  })
  console.log(await res.json())
  ```

  ```python Python (API) theme={null}
  import requests, json

  endpoint = "https://api.olostep.com/v1/scrapes"
  payload = {
    "url_to_scrape": "https://example.com/login",
    "formats": ["markdown"],
    "actions": [
      {"type": "fill_input", "selector": "input[type=email]", "value": "john@example.com"},
      {"type": "wait", "milliseconds": 500},
      {"type": "fill_input", "selector": "input[type=password]", "value": "secret"},
      {"type": "click", "selector": "button[type=\"submit\"]"},
      {"type": "wait", "milliseconds": 1500}
    ]
  }
  headers = {
      "Authorization": "Bearer <YOUR_API_KEY>", 
      "Content-Type": "application/json"
  }
  res = requests.post(endpoint, json=payload, headers=headers)
  print(json.dumps(res.json(), indent=2))
  ```
</CodeGroup>

レスポンスには、要求された形式（例: `markdown_content`）が含まれます。

## ユースケース

以下は、`/scrapes`エンドポイントを使用する顧客の実際のアプリケーションのいくつかです。

### コンテンツ分析とリサーチ

* **競合分析**: 競合他社のウェブサイトから製品の詳細、価格、機能を抽出
* **市場調査**: ランディングページ、製品説明、顧客の声を分析
* **学術研究**: 科学出版物や研究ポータルから特定のデータを収集
* **法的文書**: 公式ウェブサイトからケーススタディ、規制、または法的先例を抽出

### Eコマースと小売

* **動的価格戦略**: 競合店からリアルタイムの製品価格を取得
* **製品情報管理**: 詳細な仕様と説明を抽出
* **在庫/在庫監視**: 他の小売店での製品の可用性を確認
* **レビュー分析**: 特定の製品に対する消費者のフィードバックと感情を収集

### マーケティングとコンテンツ作成

* **コンテンツキュレーション**: ニュースレター用に関連する記事やブログ投稿を抽出
* **SEO分析**: 競合他社のキーワード使用、メタディスクリプション、ページ構造を調査
* **リードジェネレーション**: ビジネスディレクトリや企業ページから連絡先情報を抽出
* **インフルエンサーリサーチ**: インフルエンサーのプロフィールからエンゲージメントメトリクスやコンテンツスタイルを収集
* **パーソナライズされたソーシャルメディア生成**: 顧客のウェブサイトを分析してAI駆動のソーシャルメディアマーケティングを作成

### データアプリケーション

* **AIトレーニングデータ収集**: 機械学習モデルのための特定の例を収集
* **カスタムナレッジベース構築**: ソフトウェアサイトからドキュメントや指示を抽出
* **歴史データアーカイブ**: 特定の時点でのウェブサイトコンテンツを保存
* **構造化データ抽出**: ウェブコンテンツを分析用のフォーマットされたデータセットに変換

### モニタリングとアラート

* **規制コンプライアンスモニタリング**: 法的または規制ウェブサイトの変更を追跡
* **危機管理**: 特定のイベントや組織の言及をニュースサイトで監視
* **イベント追跡**: 会場や主催者のウェブサイトからのイベントの詳細を抽出
* **サービスステータスモニタリング**: 特定のプラットフォームやツールのサービスステータスページを確認

### 出版とメディア

* **ニュースアグリゲーション**: 公式ソースからの速報ニュースを抽出
* **メディアモニタリング**: ニュースサイトで特定のトピックを追跡
* **コンテンツ検証**: 主張や声明をファクトチェックするための情報を抽出
* **マルチメディア抽出**: メディアライブラリ用に埋め込まれたビデオ、画像、またはオーディオを収集

### 金融アプリケーション

* **投資リサーチ**: 企業ウェブサイトから財務諸表や年次報告書を抽出
* **経済指標**: 政府や金融機関のウェブサイトから経済データを収集
* **暗号通貨データ**: リアルタイムの価格と市場キャップ情報を抽出
* **金融ニュース分析**: 特定の市場シグナルを金融ニュースサイトで監視

### 技術的アプリケーション

* **APIドキュメント抽出**: 参照用に技術ドキュメントを収集
* **統合テスト**: サードパーティの統合を検証するためにウェブサイト要素を抽出
* **アクセシビリティテスト**: アクセシビリティ標準への準拠のためにウェブサイト構造を分析
* **ウェブアーカイブ作成**: 歴史的保存のために完全なウェブサイトコンテンツをキャプチャ

### 統合シナリオ

* **CRMシステム**: 企業ウェブサイトやLinkedinからのデータで顧客プロファイルを強化
* **コンテンツ管理システム**: 関連する外部コンテンツをインポート
* **ビジネスインテリジェンストール**: 外部市場情報で内部データを補完
* **プロジェクト管理ソフトウェア**: クライアントウェブサイトからの仕様や要件を抽出
* **カスタムダッシュボード**: 内部メトリクスと並んで抽出されたデータを表示

## エラーハンドリング

すべてのエラーは共通のエンベロープ形式に従います。`error.type`と`error.code`を確認してプログラム的に分岐します:

```json theme={null}
{
  "id": "error_abc123",
  "object": "error",
  "created": 1745673871,
  "url": "https://example.com",
  "metadata": {},
  "error": {
    "type": "...",
    "code": "...",
    "message": "..."
  }
}
```

| HTTP | `error.type`            | `error.code`            | 意味                                                                 |
| ---- | ----------------------- | ----------------------- | ------------------------------------------------------------------ |
| 400  | `invalid_request_error` | `dns_resolution_failed` | ドメインが存在しないか、URLにタイプミスがあります。                                        |
| 400  | `invalid_request_error` | `invalid_url`           | URLが不正です。                                                          |
| 502  | `invalid_request_error` | `tls_error`             | ウェブサイトに無効または互換性のないTLS/SSL証明書があります。`error.detail`は低レベルのSSLコードを持ちます。 |
| 504  | `request_timeout`       | `scrape_poll_timeout`   | スクレイプが約55秒の待機予算内に完了しませんでした。                                        |

### DNS失敗 (400)

ドメインが解決されません。URLにタイプミスがないか確認してください。

```json theme={null}
{
  "error": {
    "type": "invalid_request_error",
    "code": "dns_resolution_failed",
    "message": "URLにタイプミスがあるか、ドメインが存在しません。"
  }
}
```

### TLS/SSLエラー (502)

ターゲットウェブサイトに壊れたまたは互換性のないHTTPS構成があります。`error.detail`は診断のための特定のSSLエラーコードを提供します; `error.code`は常に`tls_error`です。

```json theme={null}
{
  "error": {
    "type": "invalid_request_error",
    "code": "tls_error",
    "detail": "err_ssl_tlsv1_alert_internal_error",
    "message": "ウェブサイトがTLSハンドシェイクを閉じたり拒否したりしました。サーバーが誤って構成されているか、サポートされていないSSL/TLSバージョンを使用している可能性があります。"
  }
}
```

### リクエストタイムアウト (504)

スクレイプが待機予算内に完了しませんでした。ページが遅い、ボット保護されている、または一時的に利用できない可能性があります。このレスポンスは再試行しても安全です。

```json theme={null}
{
  "error": {
    "type": "request_timeout",
    "code": "scrape_poll_timeout",
    "message": "スクレイプ結果を待っている間にリクエストがタイムアウトしました。ページが遅い、当社のフェッチャーに対してブロックされている、または一時的に利用できない可能性があります。"
  }
}
```

## 価格

スクレイプはデフォルトで1クレジットかかります。[parsers](/features/structured-content/parsers)を使用する場合、コストはパーサーによって異なります（1-5クレジット）。[LLM extract](/features/structured-content/llm-extraction)を使用する場合、10クレジットかかります。