The Olostep LangChain integration provides comprehensive tools to build AI agents that can search, scrape, analyze, and structure data from any website. Perfect for LangChain and LangGraph applications.Documentation Index
Fetch the complete documentation index at: https://docs.olostep.com/llms.txt
Use this file to discover all available pages before exploring further.
Features
The integration provides access to all 5 Olostep API capabilities:Scrapes
Extract content from any single URL in multiple formats (Markdown, HTML, JSON, text)
Batches
Process up to 10,000 URLs in parallel. Batch jobs complete in 5-8 minutes
Answers
AI-powered web search with natural language queries and structured output
Maps
Extract all URLs from a website for site structure analysis
Crawls
Autonomously discover and scrape entire websites by following links
Installation
Setup
Set your Olostep API key as an environment variable:Available Tools
scrape_website
Extract content from a single URL. Supports multiple formats and JavaScript rendering.Website URL to scrape (must include http:// or https://)
Output format:
markdown, html, json, or textCountry code for location-specific content (e.g., “US”, “GB”, “CA”)
Wait time in milliseconds for JavaScript rendering (0-10000)
Optional parser ID for specialized extraction (e.g., “@olostep/amazon-product”)
scrape_batch
Process multiple URLs in parallel (up to 10,000 at once).List of URLs to scrape
Output format for all URLs:
markdown, html, json, or textCountry code for location-specific content
Wait time in milliseconds for JavaScript rendering
Optional parser ID for specialized extraction
answer_question
Search the web and get AI-powered answers with sources. Perfect for data enrichment and research.Question or task to search for
Optional JSON schema dict/string describing desired output format
extract_urls
Extract all URLs from a website for site structure analysis.Website URL to extract URLs from
Optional search query to filter URLs
Limit the number of URLs returned
Glob patterns to include (e.g., [“/blog/**”])
Glob patterns to exclude (e.g., [“/admin/**”])
crawl_website
Autonomously discover and scrape entire websites by following links.Starting URL for the crawl
Maximum number of pages to crawl
Glob patterns to include (e.g., [”/**”] for all)
Glob patterns to exclude (e.g., [“/admin/**”])
Maximum depth to crawl from start_url
Include external URLs
LangChain Agent Integration
Build intelligent agents that can search and scrape the web:LangGraph Integration
Build complex multi-step workflows with LangGraph:Advanced Use Cases
Data Enrichment
Enrich spreadsheet data with web information:E-commerce Product Scraping
Scrape product data with specialized parsers:SEO Audit
Analyze entire websites for SEO:Documentation Scraping
Crawl and extract documentation:Specialized Parsers
Olostep provides pre-built parsers for popular websites:@olostep/google-search- Google search results
parser parameter:
Error Handling
Best Practices
Use Batch Processing for Multiple URLs
Use Batch Processing for Multiple URLs
When scraping more than 3-5 URLs, use
scrape_batch instead of multiple scrape_website calls. Batch processing is much faster and more cost-effective.Set Appropriate Timeouts
Set Appropriate Timeouts
For JavaScript-heavy sites, use
wait_before_scraping parameter (2000-5000ms is typical). This ensures dynamic content is fully loaded.Use Specialized Parsers
Use Specialized Parsers
For popular websites (Amazon, LinkedIn, Google), use our pre-built parsers to get structured data automatically.
Filter URLs Efficiently
Filter URLs Efficiently
When using
extract_urls or crawl_website, use glob patterns to focus on relevant pages and avoid unnecessary processing.Handle Rate Limits
Handle Rate Limits
Implement exponential backoff for rate limit errors. The API automatically handles most rate limiting internally.
Support
- PyPI Package: langchain-olostep
- Documentation: docs.olostep.com
- Issues: GitHub Issues
- Email: info@olostep.com
Related Resources
Scrapes API
Learn about the Scrapes endpoint
Batches API
Learn about the Batches endpoint
Answers API
Learn about the Answers endpoint
Maps API
Learn about the Maps endpoint
Crawls API
Learn about the Crawls endpoint
Python SDK
Explore the Python SDK
LangChain Website
LangChain platform