Use this file to discover all available pages before exploring further.
Through the Olostep /v1/maps endpoint you can get all the URLs on a website. This is useful for content discovery, site structure analysis (e.g., SEO), or deciding which URLs you want to scrape next.
Get all URLs on a website (including sitemaps and discovered links)
Use special patterns to include/exclude paths (e.g. /blog/**)
Paginate large responses with cursor (up to 10MB per response)
Send a POST request with the website url. Optionally pass include_urls, exclude_urls (glob patterns), and top_n.
from olostep import Olostepclient = Olostep(api_key="YOUR_REAL_KEY")sitemap = client.maps.create(url="https://docs.olostep.com")for url in sitemap.urls(): print(url)
The response time is typically within seconds but can take up to 120 seconds for more complex websites. It can extract all URLs from a website, even backlinks and those not present in the Sitemaps. You can also also decide the URLs paths you want to include or exclude from the response.By default the endpoint returns around 100k URLs in a single call (10MB max). If the response includes more data, the API returns a cursor parameter which can be used for pagination and getting the subsequent URLs. For more details refer to the API ReferenceThis endpoint is particularly useful when you need to:
Discover all content pages on a website
Analyze site structure and hierarchy
Prepare URLs for batch processing
Decide which specific URLs to scrape
For more fine-grained control over the URLs returned you can use the params include_urls and exclude_urls.
Let’s say that from www.brex.com you want to extract all the urls that have the paths after /product/ e.g https://www.brex.com/product/api/no-code but also include www.brex.com/product.
You can use the following code:
from olostep import Olostepclient = Olostep(api_key="YOUR_REAL_KEY")sitemap = client.maps.create( url="https://www.brex.com/", include_urls=["/product", "/product/**"], top_n=100000,)for url in sitemap.urls(): print(url)
The maps endpoint is a powerful tool for content discovery and site analysis. It provides a comprehensive list of URLs on a website, enabling you to extract content from specific pages or analyze the site structure. This endpoint is particularly useful for SEO professionals, content marketers, AI agents who need to analyze website content or structure.