Features
The integration provides 4 powerful APIs for automated web data extraction:Scrape Website
Extract content from any single URL in multiple formats (Markdown, HTML, JSON, text)
Batch Scrape URLs
Process up to 100,000 URLs in parallel. Perfect for large-scale data extraction
Create Crawl
Autonomously discover and scrape entire websites by following links
Create Map
Extract all URLs from a website for site structure analysis and content discovery
Installation
Setup
1. Install the Package
2. Import and Register Integration
In your Mastra configuration file:3. Configure API Key
Set your Olostep API key as an environment variable:.env file:
Available APIs
The integration exposes 4 APIs that your Mastra agents can use:scrapeWebsite
Extract content from a single URL. Supports multiple formats and JavaScript rendering. Use Cases:- Monitor specific pages for changes
- Extract product information from e-commerce sites
- Gather data from news articles or blog posts
- Pull content for content aggregation
Your Olostep API key
Website URL to scrape (must include http:// or https://)
Output formats: [‘html’, ‘markdown’, ‘json’, ‘text’]
Country code for location-specific content (e.g., “US”, “GB”, “CA”)
Wait time in milliseconds for JavaScript rendering (0-10000)
Optional parser ID for specialized extraction (e.g., “@olostep/amazon-product”)
id- Scrape IDurl_to_scrape- Scraped URLresult.markdown_content- Markdown contentresult.html_content- HTML contentresult.json_content- JSON contentresult.text_content- Text contentresult.screenshot_hosted_url- Screenshot URL (if available)result.markdown_hosted_url- Hosted markdown URLobject- Object type (“scrape”)created- Unix timestamp
batchScrape
Process multiple URLs in parallel (up to 100,000 at once). Perfect for large-scale data extraction. Use Cases:- Scrape entire product catalogs
- Extract data from multiple search results
- Process lists of URLs from spreadsheets
- Bulk content extraction
Your Olostep API key
Array of objects with
url and optional custom_id fieldsExample: [{"url":"https://example.com","custom_id":"site1"}]Output formats for all URLs
Country code for location-specific scraping
Wait time in milliseconds for JavaScript rendering
Optional parser ID for specialized extraction
batch_id- Batch ID (use this to retrieve results later)status- Processing statusobject- Object type (“batch”)
createCrawl
Autonomously discover and scrape entire websites by following links. Perfect for documentation sites, blogs, and content repositories. Use Cases:- Crawl and archive entire documentation sites
- Extract all blog posts from a website
- Build knowledge bases from web content
- Monitor website structure changes
Your Olostep API key
Starting URL for the crawl (must include http:// or https://)
Maximum number of pages to crawl
Whether to follow links found on pages
Format for scraped content
Optional country code for location-specific crawling
Optional parser ID for specialized content extraction
id- Crawl ID (use this to retrieve results later)object- Object type (“crawl”)status- Crawl statuscreated- Unix timestamp
createMap
Extract all URLs from a website for content discovery and site structure analysis. Use Cases:- Build sitemaps and site structure diagrams
- Discover all pages before batch scraping
- Find broken or missing pages
- SEO audits and analysis
Your Olostep API key
Website URL to extract links from (must include http:// or https://)
Optional search query to filter URLs (e.g., “blog”)
Limit the number of URLs returned
Glob patterns to include specific paths (e.g., [“/blog/**”])
Glob patterns to exclude specific paths (e.g., [“/admin/**”])
id- Map IDobject- Object type (“map”)url- Website URLtotal_urls- Total URLs foundurls- Array of discovered URLs
Using with Agents
Basic Agent Example
Create an agent that can scrape websites:Agent Workflow Example
Build a research workflow that discovers and scrapes content:Popular Use Cases
Research Agent
Build an agent that autonomously researches topics:Multi-Source Research
Multi-Source Research
Workflow:
- User asks: “Research AI trends”
- Agent uses
createMapto discover relevant pages - Agent uses
batchScrapeto extract content - Agent analyzes and summarizes findings
- Returns structured research report
Competitor Monitoring
Competitor Monitoring
Workflow:
- Schedule daily monitoring
- Use
scrapeWebsiteto check competitor pages - Compare with previous data
- Alert on significant changes
- Generate weekly reports
Content Aggregation
Content Aggregation
Workflow:
- Use
createCrawlto discover all blog posts - Use
batchScrapeto extract content - Process with AI to extract key topics
- Store in knowledge base
- Generate content calendar
E-commerce Intelligence
Monitor products and prices:SEO Analysis
Analyze website structure and content:Specialized Parsers
Olostep provides pre-built parsers for popular websites. Use them with theparser parameter:
Amazon Product
@olostep/amazon-productExtract: title, price, rating, reviews, images, variantsLinkedIn Profile
@olostep/linkedin-profileExtract: name, title, company, location, experienceLinkedIn Company
@olostep/linkedin-companyExtract: company info, employee count, industry, descriptionGoogle Search
@olostep/google-searchExtract: search results, titles, snippets, URLsGoogle Maps
@olostep/google-mapsExtract: business info, reviews, ratings, locationInstagram Profile
@olostep/instagram-profileExtract: profile info, followers, posts, bioUsing Parsers
Add the parser ID to theparser parameter:
Best Practices
Use Batch Processing for Multiple URLs
Use Batch Processing for Multiple URLs
When scraping more than 3-5 URLs, use
batchScrape instead of multiple scrapeWebsite calls. Batch processing is:- Much faster (parallel processing)
- More cost-effective
- Easier to manage
- Better for rate limits
Set Appropriate Wait Times
Set Appropriate Wait Times
For JavaScript-heavy sites, use the
wait_before_scraping parameter:- Simple sites: 0-1000ms
- Dynamic sites: 2000-3000ms
- Heavy JavaScript: 5000-8000ms
Use Specialized Parsers
Use Specialized Parsers
For popular websites (Amazon, LinkedIn, Google), use pre-built parsers:
- Get structured data automatically
- More reliable extraction
- No need for custom parsing
- Maintained by Olostep
Handle Async Operations
Handle Async Operations
Batch, Crawl, and Map operations are asynchronous:
- Store the returned ID (batch_id, crawl_id, map_id)
- Poll for completion or use webhooks
- Set up separate workflows for retrieval
Error Handling
Error Handling
Always wrap API calls in try-catch blocks:
Rate Limiting
Rate Limiting
Be mindful of rate limits:
- Space out requests with delays
- Use batch processing when possible
- Monitor usage in Olostep dashboard
- Upgrade plan if needed
Complete Example
Here’s a complete example of building a research agent:Troubleshooting
Authentication Failed
Authentication Failed
Error: “Invalid API key”Solutions:
- Check API key from dashboard
- Ensure API key is set in environment variable
- Verify API key is active
- Check for extra spaces in API key
API Not Found
API Not Found
Error: “API not found” or “Integration not registered”Solutions:
- Ensure
registerApis()is called after creating integration - Verify integration is added to Mastra config
- Check integration name is ‘olostep’
- Restart Mastra server after changes
Scrape Returns Empty Content
Scrape Returns Empty Content
Error: Content fields are emptySolutions:
- Increase
wait_before_scrapingtime - Check if website requires login
- Try different format (HTML vs Markdown)
- Verify URL is accessible
- Check if site blocks automated access
Rate Limit Exceeded
Rate Limit Exceeded
Error: “Rate limit exceeded”Solutions:
- Space out requests with delays
- Use batch processing instead of individual scrapes
- Upgrade your Olostep plan
- Check rate limit in dashboard
TypeScript Errors
TypeScript Errors
Error: Module not found or type errorsSolutions:
- Ensure
@mastra/coreis installed - Check TypeScript version compatibility
- Verify all dependencies are installed
- Rebuild:
npm run build
Pricing
Olostep charges based on API usage, independent of Mastra:- Scrapes: Pay per scrape
- Batches: Pay per URL in batch
- Crawls: Pay per page crawled
- Maps: Pay per map operation
Support
Need help with the Mastra integration?Documentation
Browse complete API docs
Support Email
Email: info@olostep.com
Mastra Docs
Learn about Mastra framework
Status Page
Check API status
Related Resources
Scrapes API
Learn about the Scrapes endpoint
Batches API
Learn about the Batches endpoint
Crawls API
Learn about the Crawls endpoint
Maps API
Learn about the Maps endpoint
Zapier Integration
Automate with Zapier workflows
LangChain Integration
Build AI agents with LangChain
Get Started
Ready to build AI agents with web scraping capabilities?Install Package
Install @olostep/mastra-tools from npm