> ## Documentation Index
> Fetch the complete documentation index at: https://docs.olostep.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Extract Blog URLs from Stripe's Website

> Filter and extract only the blog URLs from Stripe's website for targeted content analysis.

## Overview

Instead of [mapping entire website](/use-cases/get-website-structure), you might want to focus on specific sections. In this guide, we'll show you how to extract only the blog URLs from Stripe's website.

## Extracting Only Blog URLs

To extract only blog URLs from Stripe's website, use the maps endpoint with path pattern filters. The `include_urls` parameter allows you to specify exactly which URL patterns you want to include in the results.

<CodeGroup>
  ```python extract_stripe_blog_urls.py theme={null}
  import requests
  import time
  import json

  # Configuration
  API_URL = 'https://api.olostep.com/v1'
  API_KEY = '<your_olostep_api_key>'
  HEADERS = {
      'Content-Type': 'application/json',
      'Authorization': f'Bearer {API_KEY}'
  }

  # Start time for latency tracking
  start_time = time.time()

  # Define the payload with URL patterns to include
  payload = {
      "url": "https://stripe.com",
      "include_urls": ["/blog", "/blog/**"]  # Match /blog and all paths under /blog
  }

  # Make the request
  response = requests.post(f'{API_URL}/maps', headers=HEADERS, json=payload)

  # Calculate latency
  latency = round((time.time() - start_time) * 1000, 2)
  print(f"Request completed in {latency}ms")

  # Process the results
  data = response.json()
  print(f"Found {data['urls_count']} blog URLs on Stripe's website")

  # Print the first 10 URLs as a sample
  print("\nSample blog URLs:")
  for url in data['urls'][:10]:
      print(f"- {url}")
      
  # Save blog URLs to a file for further processing
  with open('stripe_blog_urls.json', 'w') as f:
      json.dump(data, f, indent=2)
  print(f"\nAll blog URLs saved to stripe_blog_urls.json")
  ```
</CodeGroup>

### Understanding the URL Patterns

In the example above, we're using two pattern specifications:

* `/blog` - Matches exactly the main blog page ([https://stripe.com/blog](https://stripe.com/blog))
* `/blog/**` - Matches all subpaths under /blog, including individual blog posts, category pages, etc.

This combination ensures we capture all blog-related content while excluding other sections of the website.

### Example Response

```json theme={null}
{
  "id": "map_xyz789abc",
  "urls_count": 278,
  "urls": [
    "https://stripe.com/blog",
    "https://stripe.com/blog/page/1",
    "https://stripe.com/blog/page/2",
    "https://stripe.com/blog/engineering",
    "https://stripe.com/blog/product",
    "https://stripe.com/blog/how-we-built-it-usage-based-billing",
    "https://stripe.com/blog/using-ml-to-detect-and-respond-to-performance-degradations",
    "https://stripe.com/blog/stripe-radar-responded-to-card-testing",
    "https://stripe.com/blog/future-of-real-time-payments",
    "https://stripe.com/blog/ml-flywheel-improve-models"
    // ... additional URLs omitted for brevity
  ]
}
```

## Filtering Blog URLs by Category

You can further refine your extraction to focus on specific blog categories. For example, if you're only interested in Stripe's engineering blog posts:

<CodeGroup>
  ```python extract_engineering_blog_urls.py theme={null}
  # Define the payload with more specific URL patterns
  payload = {
      "url": "https://stripe.com",
      "include_urls": ["/blog/engineering", "/blog/engineering/**"]
  }
  ```
</CodeGroup>

## Next Steps

Now that you have extracted all of Stripe's blog URLs,

1. You can fetch their content individually using the [scrape API](../../api-reference/scrapes/create).
2. Or, use the [next guide](/use-cases/crawl-matching-pages) to crawl and extract the actual content from these blog pages directly with inbuilt filters.
