Skip to main content
The Olostep Mastra integration brings powerful web data extraction capabilities to Mastra.ai agents. Olostep is a Web search, scraping and crawling API — an API to search, extract and structure web data. Build intelligent AI agents that can autonomously search, scrape, analyze, and structure data from any website. Install from npm →

Features

The integration provides 4 powerful APIs for automated web data extraction:

Scrape Website

Extract content from any single URL in multiple formats (Markdown, HTML, JSON, text)

Batch Scrape URLs

Process up to 100,000 URLs in parallel. Perfect for large-scale data extraction

Create Crawl

Autonomously discover and scrape entire websites by following links

Create Map

Extract all URLs from a website for site structure analysis and content discovery

Installation

npm install @olostep/mastra-tools

Setup

1. Install the Package

npm install @olostep/mastra-tools @mastra/core

2. Import and Register Integration

In your Mastra configuration file:
import { Mastra } from '@mastra/core';
import { createOlostepIntegration } from '@olostep/mastra-tools';

// Create the Olostep integration
const olostep = createOlostepIntegration();

// Register APIs (this makes them available to agents)
olostep.registerApis();

// Add to your Mastra config
export const mastra = new Mastra({
  config: {
    integrations: [olostep],
    // ... other config
  },
});

3. Configure API Key

Set your Olostep API key as an environment variable:
export OLOSTEP_API_KEY=your-api-key-here
Or in your .env file:
OLOSTEP_API_KEY=your-api-key-here
Get your API key from the Olostep Dashboard.

Available APIs

The integration exposes 4 APIs that your Mastra agents can use:

scrapeWebsite

Extract content from a single URL. Supports multiple formats and JavaScript rendering. Use Cases:
  • Monitor specific pages for changes
  • Extract product information from e-commerce sites
  • Gather data from news articles or blog posts
  • Pull content for content aggregation
Schema Parameters:
apiKey
string
required
Your Olostep API key
url_to_scrape
string
required
Website URL to scrape (must include http:// or https://)
formats
array
default:"['markdown']"
Output formats: [‘html’, ‘markdown’, ‘json’, ‘text’]
country
string
Country code for location-specific content (e.g., “US”, “GB”, “CA”)
wait_before_scraping
number
Wait time in milliseconds for JavaScript rendering (0-10000)
parser
string
Optional parser ID for specialized extraction (e.g., “@olostep/amazon-product”)
Response:
  • id - Scrape ID
  • url_to_scrape - Scraped URL
  • result.markdown_content - Markdown content
  • result.html_content - HTML content
  • result.json_content - JSON content
  • result.text_content - Text content
  • result.screenshot_hosted_url - Screenshot URL (if available)
  • result.markdown_hosted_url - Hosted markdown URL
  • object - Object type (“scrape”)
  • created - Unix timestamp
Example Usage:
// In your agent or workflow
const result = await mastra.callApi({
  integrationName: 'olostep',
  api: 'scrapeWebsite',
  payload: {
    data: {
      apiKey: process.env.OLOSTEP_API_KEY,
      url_to_scrape: 'https://example.com',
      formats: ['markdown'],
      country: 'US',
    }
  }
});

batchScrape

Process multiple URLs in parallel (up to 100,000 at once). Perfect for large-scale data extraction. Use Cases:
  • Scrape entire product catalogs
  • Extract data from multiple search results
  • Process lists of URLs from spreadsheets
  • Bulk content extraction
Schema Parameters:
apiKey
string
required
Your Olostep API key
batch_array
array
required
Array of objects with url and optional custom_id fieldsExample: [{"url":"https://example.com","custom_id":"site1"}]
formats
array
default:"['markdown']"
Output formats for all URLs
country
string
Country code for location-specific scraping
wait_before_scraping
number
Wait time in milliseconds for JavaScript rendering
parser
string
Optional parser ID for specialized extraction
Response:
  • batch_id - Batch ID (use this to retrieve results later)
  • status - Processing status
  • object - Object type (“batch”)
Example Usage:
const result = await mastra.callApi({
  integrationName: 'olostep',
  api: 'batchScrape',
  payload: {
    data: {
      apiKey: process.env.OLOSTEP_API_KEY,
      batch_array: [
        { url: 'https://example.com', custom_id: 'site1' },
        { url: 'https://test.com', custom_id: 'site2' },
      ],
      formats: ['markdown'],
    }
  }
});

createCrawl

Autonomously discover and scrape entire websites by following links. Perfect for documentation sites, blogs, and content repositories. Use Cases:
  • Crawl and archive entire documentation sites
  • Extract all blog posts from a website
  • Build knowledge bases from web content
  • Monitor website structure changes
Schema Parameters:
apiKey
string
required
Your Olostep API key
start_url
string
required
Starting URL for the crawl (must include http:// or https://)
max_pages
number
default:"10"
Maximum number of pages to crawl
Whether to follow links found on pages
formats
array
default:"['markdown']"
Format for scraped content
country
string
Optional country code for location-specific crawling
parser
string
Optional parser ID for specialized content extraction
Response:
  • id - Crawl ID (use this to retrieve results later)
  • object - Object type (“crawl”)
  • status - Crawl status
  • created - Unix timestamp
Example Usage:
const result = await mastra.callApi({
  integrationName: 'olostep',
  api: 'createCrawl',
  payload: {
    data: {
      apiKey: process.env.OLOSTEP_API_KEY,
      start_url: 'https://docs.example.com',
      max_pages: 50,
      follow_links: true,
      formats: ['markdown'],
    }
  }
});

createMap

Extract all URLs from a website for content discovery and site structure analysis. Use Cases:
  • Build sitemaps and site structure diagrams
  • Discover all pages before batch scraping
  • Find broken or missing pages
  • SEO audits and analysis
Schema Parameters:
apiKey
string
required
Your Olostep API key
url
string
required
Website URL to extract links from (must include http:// or https://)
search_query
string
Optional search query to filter URLs (e.g., “blog”)
top_n
number
Limit the number of URLs returned
include_urls
array
Glob patterns to include specific paths (e.g., [“/blog/**”])
exclude_urls
array
Glob patterns to exclude specific paths (e.g., [“/admin/**”])
Response:
  • id - Map ID
  • object - Object type (“map”)
  • url - Website URL
  • total_urls - Total URLs found
  • urls - Array of discovered URLs
Example Usage:
const result = await mastra.callApi({
  integrationName: 'olostep',
  api: 'createMap',
  payload: {
    data: {
      apiKey: process.env.OLOSTEP_API_KEY,
      url: 'https://example.com',
      search_query: 'blog',
      top_n: 100,
      include_urls: ['/blog/**'],
    }
  }
});

Using with Agents

Basic Agent Example

Create an agent that can scrape websites:
import { Agent } from '@mastra/core';
import { createOlostepIntegration } from '@olostep/mastra-tools';

const olostep = createOlostepIntegration();
olostep.registerApis();

const agent = new Agent({
  name: 'web-researcher',
  instructions: `
    You are a web research assistant. When users ask you to get information from a website,
    use the Olostep scrapeWebsite API to extract the content, then summarize it for them.
  `,
  model: 'openai/gpt-4',
});

// The agent can now use Olostep APIs through Mastra's API system

Agent Workflow Example

Build a research workflow that discovers and scrapes content:
// 1. Map a website to discover URLs
const mapResult = await mastra.callApi({
  integrationName: 'olostep',
  api: 'createMap',
  payload: {
    data: {
      apiKey: process.env.OLOSTEP_API_KEY,
      url: 'https://example.com',
      include_urls: ['/blog/**'],
    }
  }
});

// 2. Batch scrape discovered URLs
const batchResult = await mastra.callApi({
  integrationName: 'olostep',
  api: 'batchScrape',
  payload: {
    data: {
      apiKey: process.env.OLOSTEP_API_KEY,
      batch_array: mapResult.urls.slice(0, 10).map(url => ({ url })),
      formats: ['markdown'],
    }
  }
});

// 3. Process results with your agent
const summary = await agent.generate({
  messages: [{
    role: 'user',
    content: `Summarize this content: ${batchResult.result.markdown_content}`
  }]
});

Research Agent

Build an agent that autonomously researches topics:
Workflow:
  1. User asks: “Research AI trends”
  2. Agent uses createMap to discover relevant pages
  3. Agent uses batchScrape to extract content
  4. Agent analyzes and summarizes findings
  5. Returns structured research report
Workflow:
  1. Schedule daily monitoring
  2. Use scrapeWebsite to check competitor pages
  3. Compare with previous data
  4. Alert on significant changes
  5. Generate weekly reports
Workflow:
  1. Use createCrawl to discover all blog posts
  2. Use batchScrape to extract content
  3. Process with AI to extract key topics
  4. Store in knowledge base
  5. Generate content calendar

E-commerce Intelligence

Monitor products and prices:
Agent Workflow:
1. Scrape product pages (scrapeWebsite)
2. Extract structured data (with parser)
3. Track price changes
4. Generate alerts
5. Update database

SEO Analysis

Analyze website structure and content:
Agent Workflow:
1. Map website structure (createMap)
2. Crawl important sections (createCrawl)
3. Analyze content quality
4. Identify SEO opportunities
5. Generate recommendations

Specialized Parsers

Olostep provides pre-built parsers for popular websites. Use them with the parser parameter:

Google Search

@olostep/google-searchExtract: search results, titles, snippets, URLs

Google Maps

@olostep/google-mapsExtract: business info, reviews, ratings, location

Using Parsers

Add the parser ID to the parser parameter:
const result = await mastra.callApi({
  integrationName: 'olostep',
  api: 'scrapeWebsite',
  payload: {
    data: {
      apiKey: process.env.OLOSTEP_API_KEY,
      url_to_scrape: 'https://www.amazon.com/dp/PRODUCT_ID',
      formats: ['json'],
      parser: '@olostep/amazon-product',
    }
  }
});
The parser automatically extracts structured data specific to that website type.

Best Practices

When scraping more than 3-5 URLs, use batchScrape instead of multiple scrapeWebsite calls. Batch processing is:
  • Much faster (parallel processing)
  • More cost-effective
  • Easier to manage
  • Better for rate limits
For JavaScript-heavy sites, use the wait_before_scraping parameter:
  • Simple sites: 0-1000ms
  • Dynamic sites: 2000-3000ms
  • Heavy JavaScript: 5000-8000ms
Test with different values to find the optimal wait time.
For popular websites (Amazon, LinkedIn, Google), use pre-built parsers:
  • Get structured data automatically
  • More reliable extraction
  • No need for custom parsing
  • Maintained by Olostep
Batch, Crawl, and Map operations are asynchronous:
  • Store the returned ID (batch_id, crawl_id, map_id)
  • Poll for completion or use webhooks
  • Set up separate workflows for retrieval
Always wrap API calls in try-catch blocks:
try {
  const result = await mastra.callApi({
    integrationName: 'olostep',
    api: 'scrapeWebsite',
    payload: { data: {...} }
  });
} catch (error) {
  // Handle authentication, rate limit, or network errors
  console.error('Scraping failed:', error.message);
}
Be mindful of rate limits:
  • Space out requests with delays
  • Use batch processing when possible
  • Monitor usage in Olostep dashboard
  • Upgrade plan if needed

Complete Example

Here’s a complete example of building a research agent:
import { Mastra } from '@mastra/core';
import { Agent } from '@mastra/core';
import { createOlostepIntegration } from '@olostep/mastra-tools';

// Create and register Olostep integration
const olostep = createOlostepIntegration();
olostep.registerApis();

// Initialize Mastra
export const mastra = new Mastra({
  config: {
    integrations: [olostep],
    // ... other config
  },
});

// Create research agent
const researchAgent = new Agent({
  name: 'research-assistant',
  instructions: `
    You are a research assistant that can search, extract, and structure web data.
    When users ask you to research a topic:
    1. Use Olostep's createMap to discover relevant pages
    2. Use batchScrape to extract content from multiple sources
    3. Analyze and summarize the findings
    4. Present structured research reports
  `,
  model: 'openai/gpt-4',
});

// Use the agent
async function researchTopic(topic: string) {
  // Step 1: Discover relevant pages
  const mapResult = await mastra.callApi({
    integrationName: 'olostep',
    api: 'createMap',
    payload: {
      data: {
        apiKey: process.env.OLOSTEP_API_KEY!,
        url: `https://example.com/search?q=${topic}`,
        top_n: 20,
      }
    }
  });

  // Step 2: Scrape discovered pages
  const batchResult = await mastra.callApi({
    integrationName: 'olostep',
    api: 'batchScrape',
    payload: {
      data: {
        apiKey: process.env.OLOSTEP_API_KEY!,
        batch_array: mapResult.urls.slice(0, 10).map(url => ({ url })),
        formats: ['markdown'],
      }
    }
  });

  // Step 3: Analyze with agent
  const summary = await researchAgent.generate({
    messages: [{
      role: 'user',
      content: `Based on this research data, provide a comprehensive summary of ${topic}`
    }]
  });

  return summary;
}

Troubleshooting

Error: “Invalid API key”Solutions:
  • Check API key from dashboard
  • Ensure API key is set in environment variable
  • Verify API key is active
  • Check for extra spaces in API key
Error: “API not found” or “Integration not registered”Solutions:
  • Ensure registerApis() is called after creating integration
  • Verify integration is added to Mastra config
  • Check integration name is ‘olostep’
  • Restart Mastra server after changes
Error: Content fields are emptySolutions:
  • Increase wait_before_scraping time
  • Check if website requires login
  • Try different format (HTML vs Markdown)
  • Verify URL is accessible
  • Check if site blocks automated access
Error: “Rate limit exceeded”Solutions:
  • Space out requests with delays
  • Use batch processing instead of individual scrapes
  • Upgrade your Olostep plan
  • Check rate limit in dashboard
Error: Module not found or type errorsSolutions:
  • Ensure @mastra/core is installed
  • Check TypeScript version compatibility
  • Verify all dependencies are installed
  • Rebuild: npm run build

Pricing

Olostep charges based on API usage, independent of Mastra:
  • Scrapes: Pay per scrape
  • Batches: Pay per URL in batch
  • Crawls: Pay per page crawled
  • Maps: Pay per map operation
Check current pricing at olostep.com/pricing.

Support

Need help with the Mastra integration?

Documentation

Browse complete API docs

Support Email

Mastra Docs

Learn about Mastra framework

Scrapes API

Learn about the Scrapes endpoint

Batches API

Learn about the Batches endpoint

Crawls API

Learn about the Crawls endpoint

Maps API

Learn about the Maps endpoint

Zapier Integration

Automate with Zapier workflows

LangChain Integration

Build AI agents with LangChain

Get Started

Ready to build AI agents with web scraping capabilities?

Install Package

Install @olostep/mastra-tools from npm
Build intelligent AI agents that can search, extract, and structure web data with Olostep and Mastra!