Features
The integration provides 5 powerful operations for automated web data extraction:Scrape Website
Search
Batch Scrape URLs
Create Crawl
Create Map
Installation
1. Install the Node
Install the Olostep node package via npm:2. Connect Your Account
When you first use the Olostep node in a workflow, you’ll need to configure credentials:- Add the “Olostep Scrape” node to your workflow
- Click on the node to open its settings
- Click “Create New Credential” or select existing credentials
- Enter your Olostep API key
- Click “Save” to store the credential
Available Actions
Scrape Website
Extract content from a single URL. Supports multiple formats and JavaScript rendering. Use Cases:- Monitor specific pages for changes
- Extract product information from e-commerce sites
- Gather data from news articles or blog posts
- Pull content for content aggregation
- Scrape ID
- Scraped URL
- Markdown Content
- HTML Content
- JSON Content
- Text Content
- Status
- Timestamp
- Screenshot URL (if available)
- Page Metadata
Monitor Competitor Pricing
Monitor Competitor Pricing
- URL: Competitor product page
- Format: JSON
- Parser: @olostep/amazon-product
- Add price data to tracking spreadsheet
- Alert team about price changes
Extract and Save Blog Posts
Extract and Save Blog Posts
- URL: {{$json.link}}
- Format: Markdown
- Save article content to Notion database
Lead Enrichment
Lead Enrichment
- URL: Company website from sheet
- Format: Markdown
- Extract company information using AI
- Add enriched data back to sheet
Search
Perform a Google search for a given query and get structured results. Use Cases:- Automated research workflows
- Lead discovery and enrichment
- Competitive analysis
- Content research
Automated Research
Automated Research
- Query: “latest AI developments”
- Extract and format key information
- Store research findings
Lead Discovery
Lead Discovery
- Query: ”{{$json.searchTerm}}”
- Store leads with contact information
Batch Scrape URLs
Scrape up to 10k urls at the same time. Perfect for large-scale data extraction. Use Cases:- Scrape entire product catalogs
- Extract data from multiple search results
- Process lists of URLs from spreadsheets
- Bulk content extraction
[{"url":"https://example.com","custom_id":"site1"}]- Batch ID (use this to retrieve results later)
- Status
- Total URLs
- Created At
- Requested Format
- Country Code
- Parser Used
Scrape Product Catalog
Scrape Product Catalog
- Convert CSV/list to JSON array format
- URLs: {{$json.urlArray}}
- Format: JSON
- Parser: @olostep/amazon-product
- Send batch ID to your system for retrieval
Daily Content Monitoring
Daily Content Monitoring
- Fetch URLs to monitor
- Convert to batch array format
- Process all URLs at once
- Notify team that scraping is complete
Create Crawl
Get the content of subpages of a URL. Autonomously discover and scrape entire websites by following links. Perfect for documentation sites, blogs, and content repositories. Use Cases:- Crawl and archive entire documentation sites
- Extract all blog posts from a website
- Build knowledge bases from web content
- Monitor website structure changes
- Crawl ID (use this to retrieve results later)
- Object Type
- Status
- Start URL
- Maximum Pages
- Follow Links
- Created Timestamp
- Formats
Archive Documentation Site
Archive Documentation Site
- Start URL: https://docs.example.com
- Max Pages: 500
- Follow Links: true
- Format: Markdown
- Send crawl ID to your archive system
- Notify team that crawl is in progress
Competitor Content Analysis
Competitor Content Analysis
- Start URL: Competitor blog URL
- Max Pages: 100
- Format: Markdown
- Wait for crawl to complete
- Store crawl data for analysis
Create Map
Get all URLs on a website. Extract all URLs from a website for content discovery and site structure analysis. Use Cases:- Build sitemaps and site structure diagrams
- Discover all pages before batch scraping
- Find broken or missing pages
- SEO audits and analysis
- Map ID
- Object Type
- Website URL
- Total URLs Found
- URLs (JSON array)
- Search Query
- Top N Limit
Discover and Scrape
Discover and Scrape
- URL: https://example.com
- Include Patterns: /products/**
- Top N: 500
- Parse URLs from map result
- URLs: {{$json.urls}}
- Format: JSON
- Add all product data to spreadsheet
SEO Site Audit
SEO Site Audit
- URL: Your website
- Top N: 1000
- Store all URLs for tracking
- Report total pages found
Popular Workflow Examples
E-commerce Price Monitoring
Monitor competitor prices and get instant alerts:Content Aggregation
Aggregate content from multiple sources:Lead Enrichment Pipeline
Enrich lead data with web information:Research Automation
Automate research from multiple sources:Social Media Monitoring
Track mentions and content:Multi-Step Workflows
Complete Product Scraping Pipeline
Build a comprehensive product data pipeline:Discover Product URLs
- Include patterns:
/products/** - Exclude patterns:
/cart/**,/checkout/**
Batch Process Products
- Format: JSON
- Parser: Product-specific parser if available
Store in Database
- Use Airtable, Google Sheets, or your database
Monitor for Changes
- Compare with existing data
- Alert on significant changes
SEO Content Strategy
Analyze competitors and plan content:Map Competitor Sites
- Extract all blog posts and content pages
Scrape Content
- Format: Markdown for easy analysis
AI Analysis
- Identify content gaps
- Find trending topics
Create Content Calendar
- Plan your content strategy
Specialized Parsers
Olostep provides pre-built parsers for popular websites. Use them with theParser field:
Amazon Product
@olostep/amazon-productExtract: title, price, rating, reviews, images, variantsLinkedIn Profile
@olostep/linkedin-profileExtract: name, title, company, location, experienceLinkedIn Company
@olostep/linkedin-companyExtract: company info, employee count, industry, descriptionGoogle Search
@olostep/google-searchExtract: search results, titles, snippets, URLsGoogle Maps
@olostep/google-mapsExtract: business info, reviews, ratings, locationInstagram Profile
@olostep/instagram-profileExtract: profile info, followers, posts, bioUsing Parsers
Simply add the parser ID to the Parser field:Integration with Popular Apps
Google Sheets
Perfect for data collection and tracking:- Price tracking spreadsheets
- Lead enrichment databases
- Content inventory
- Competitor analysis sheets
Airtable
Build powerful databases with scraped data:- Product catalogs
- Research databases
- Content calendars
- Link databases
Slack
Get instant notifications:- Price drop alerts
- Content update notifications
- Error monitoring
- Daily digests
HubSpot / Salesforce
Enrich CRM data automatically:- Lead enrichment
- Company research
- Competitive intelligence
- Account mapping
Notion
Build knowledge bases:- Documentation mirrors
- Research repositories
- Content libraries
- Team wikis
Best Practices
Use Batch Processing for Multiple URLs
Use Batch Processing for Multiple URLs
- Much faster (parallel processing)
- More cost-effective
- Easier to manage
- Better for rate limits
Set Appropriate Wait Times
Set Appropriate Wait Times
- Simple sites: 0-1000ms
- Dynamic sites: 2000-3000ms
- Heavy JavaScript: 5000-8000ms
Use Specialized Parsers
Use Specialized Parsers
- Get structured data automatically
- More reliable extraction
- No need for custom parsing
- Maintained by Olostep
Filter Before Scraping
Filter Before Scraping
- Check if URL has changed
- Verify data hasn’t been scraped recently
- Apply business logic before scraping
Handle Async Operations
Handle Async Operations
- Store the returned ID (batch_id, crawl_id, map_id)
- Use a Wait node if retrieving immediately
- Consider webhook callbacks for completion
- Set up separate workflows for retrieval
Store Results Properly
Store Results Properly
- Google Sheets: Simple tracking, team collaboration
- Airtable: Relational data, rich formatting
- Database: Large-scale, complex queries
- Notion: Knowledge base, documentation
Monitor and Alert
Monitor and Alert
- Use Error workflows in n8n
- Send alerts to Slack/Email on failures
- Track API usage in Olostep dashboard
- Log important metrics
Common Use Cases by Industry
E-commerce
- Price Monitoring: Track competitor pricing in real-time
- Product Research: Discover trending products and market gaps
- Inventory Tracking: Monitor stock availability
- Review Analysis: Aggregate and analyze customer reviews
Marketing & SEO
- Content Discovery: Find content opportunities
- Competitor Analysis: Track competitor strategies
- Backlink Research: Discover link opportunities
- Keyword Research: Extract keyword data from search results
Sales & Lead Generation
- Lead Enrichment: Enhance CRM data with web information
- Company Research: Gather company intelligence
- Contact Discovery: Find decision-makers
- Competitive Intelligence: Track competitor moves
Research & Analytics
- Data Collection: Gather data from multiple sources
- Market Research: Track industry trends
- Academic Research: Collect research data
- Price Intelligence: Analyze pricing strategies
Media & Publishing
- Content Aggregation: Curate content from multiple sites
- News Monitoring: Track news and mentions
- Social Media: Monitor social platforms
- Trend Detection: Identify trending topics
Troubleshooting
Authentication Failed
Authentication Failed
- Check API key from dashboard
- Ensure no extra spaces in API key
- Recreate the credential in n8n
- Verify API key is active
Scrape Returns Empty Content
Scrape Returns Empty Content
- Increase “Wait Before Scraping” time
- Check if website requires login
- Try different format (HTML vs Markdown)
- Verify URL is accessible
- Check if site blocks automated access
Batch Array Format Error
Batch Array Format Error
- Use format:
[{"url":"https://example.com","custom_id":"id1"}] - Ensure proper JSON syntax
- Use Code node to format URLs correctly
- Test JSON with online validator
Rate Limit Exceeded
Rate Limit Exceeded
- Space out workflow executions with Wait nodes
- Use batch processing instead of individual scrapes
- Upgrade your Olostep plan
- Check rate limit in dashboard
URL Not Scraped
URL Not Scraped
- Verify URL format (include http:// or https://)
- Check if URL requires authentication
- Test URL in browser first
- Try with country parameter
- Contact support for blocked domains
n8n Advantages
Self-Hosted
n8n is self-hosted, giving you complete control over your workflows and data. No vendor lock-in, no data leaving your infrastructure.No Task Limits
Unlike cloud-based automation platforms, n8n doesn’t impose task limits. Run as many workflows as you need without additional costs.Open Source
n8n is open source, allowing you to customize and extend it to fit your specific needs.Cost-Effective
Self-hosted n8n is free, with optional cloud hosting available. Only pay for the Olostep API usage.Pricing
Olostep charges based on API usage, independent of n8n:- Scrapes: Pay per scrape
- Batches: Pay per URL in batch
- Crawls: Pay per page crawled
- Maps: Pay per map operation