Instead of mapping entire website, you might want to focus on specific sections. In this guide, we’ll show you how to extract only the blog URLs from Stripe’s website.
To extract only blog URLs from Stripe’s website, use the maps endpoint with path pattern filters. The include_urls parameter allows you to specify exactly which URL patterns you want to include in the results.
Copy
import requestsimport timeimport json# ConfigurationAPI_URL = 'https://api.olostep.com/v1'API_KEY = '<your_olostep_api_key>'HEADERS = { 'Content-Type': 'application/json', 'Authorization': f'Bearer {API_KEY}'}# Start time for latency trackingstart_time = time.time()# Define the payload with URL patterns to includepayload = { "url": "https://stripe.com", "include_urls": ["/blog", "/blog/**"] # Match /blog and all paths under /blog}# Make the requestresponse = requests.post(f'{API_URL}/maps', headers=HEADERS, json=payload)# Calculate latencylatency = round((time.time() - start_time) * 1000, 2)print(f"Request completed in {latency}ms")# Process the resultsdata = response.json()print(f"Found {data['urls_count']} blog URLs on Stripe's website")# Print the first 10 URLs as a sampleprint("\nSample blog URLs:")for url in data['urls'][:10]: print(f"- {url}")# Save blog URLs to a file for further processingwith open('stripe_blog_urls.json', 'w') as f: json.dump(data, f, indent=2)print(f"\nAll blog URLs saved to stripe_blog_urls.json")
You can further refine your extraction to focus on specific blog categories. For example, if you’re only interested in Stripe’s engineering blog posts:
Copy
# Define the payload with more specific URL patternspayload = { "url": "https://stripe.com", "include_urls": ["/blog/engineering", "/blog/engineering/**"]}