Extract Blog URLs from Stripe's Website
Filter and extract only the blog URLs from Stripe’s website for targeted content analysis.
Overview
Instead of mapping entire website, you might want to focus on specific sections. In this guide, we’ll show you how to extract only the blog URLs from Stripe’s website.
Extracting Only Blog URLs
To extract only blog URLs from Stripe’s website, use the maps endpoint with path pattern filters. The include_urls
parameter allows you to specify exactly which URL patterns you want to include in the results.
Understanding the URL Patterns
In the example above, we’re using two pattern specifications:
/blog
- Matches exactly the main blog page (https://stripe.com/blog)/blog/**
- Matches all subpaths under /blog, including individual blog posts, category pages, etc.
This combination ensures we capture all blog-related content while excluding other sections of the website.
Example Response
Filtering Blog URLs by Category
You can further refine your extraction to focus on specific blog categories. For example, if you’re only interested in Stripe’s engineering blog posts:
Next Steps
Now that you have extracted all of Stripe’s blog URLs,
- You can fetch their content individually using the scrape API.
- Or, use the next guide to crawl and extract the actual content from these blog pages directly with inbuilt filters.