Skip to main content
The Olostep Mastra integration brings powerful web data extraction capabilities to Mastra.ai agents. Olostep is a Web search, scraping and crawling API — an API to search, extract and structure web data. Build intelligent AI agents that can autonomously search, scrape, analyze, and structure data from any website. Install from npm →

Features

The integration provides 4 powerful APIs for automated web data extraction:

Scrape Website

Extract content from any single URL in multiple formats (Markdown, HTML, JSON, text)

Batch Scrape URLs

Process up to 100,000 URLs in parallel. Perfect for large-scale data extraction

Create Crawl

Autonomously discover and scrape entire websites by following links

Create Map

Extract all URLs from a website for site structure analysis and content discovery

Installation

npm install @olostep/mastra-tools

Setup

1. Install the Package

npm install @olostep/mastra-tools @mastra/core

2. Import and Register Integration

In your Mastra configuration file:
import { Mastra } from '@mastra/core';
import { createOlostepIntegration } from '@olostep/mastra-tools';

// Create the Olostep integration
const olostep = createOlostepIntegration();

// Register APIs (this makes them available to agents)
olostep.registerApis();

// Add to your Mastra config
export const mastra = new Mastra({
  config: {
    integrations: [olostep],
    // ... other config
  },
});

3. Configure API Key

Set your Olostep API key as an environment variable:
export OLOSTEP_API_KEY=your-api-key-here
Or in your .env file:
OLOSTEP_API_KEY=your-api-key-here
Get your API key from the Olostep Dashboard.

Available APIs

The integration exposes 4 APIs that your Mastra agents can use:

scrapeWebsite

Extract content from a single URL. Supports multiple formats and JavaScript rendering. Use Cases:
  • Monitor specific pages for changes
  • Extract product information from e-commerce sites
  • Gather data from news articles or blog posts
  • Pull content for content aggregation
Schema Parameters:
apiKey
string
required
Your Olostep API key
url_to_scrape
string
required
Website URL to scrape (must include http:// or https://)
formats
array
default:"['markdown']"
Output formats: [‘html’, ‘markdown’, ‘json’, ‘text’]
country
string
Country code for location-specific content (e.g., “US”, “GB”, “CA”)
wait_before_scraping
number
Wait time in milliseconds for JavaScript rendering (0-10000)
parser
string
Optional parser ID for specialized extraction (e.g., “@olostep/amazon-product”)
Response:
  • id - Scrape ID
  • url_to_scrape - Scraped URL
  • result.markdown_content - Markdown content
  • result.html_content - HTML content
  • result.json_content - JSON content
  • result.text_content - Text content
  • result.screenshot_hosted_url - Screenshot URL (if available)
  • result.markdown_hosted_url - Hosted markdown URL
  • object - Object type (“scrape”)
  • created - Unix timestamp
Example Usage:
// In your agent or workflow
const result = await mastra.callApi({
  integrationName: 'olostep',
  api: 'scrapeWebsite',
  payload: {
    data: {
      apiKey: process.env.OLOSTEP_API_KEY,
      url_to_scrape: 'https://example.com',
      formats: ['markdown'],
      country: 'US',
    }
  }
});

batchScrape

Process multiple URLs in parallel (up to 100,000 at once). Perfect for large-scale data extraction. Use Cases:
  • Scrape entire product catalogs
  • Extract data from multiple search results
  • Process lists of URLs from spreadsheets
  • Bulk content extraction
Schema Parameters:
apiKey
string
required
Your Olostep API key
batch_array
array
required
Array of objects with url and optional custom_id fieldsExample: [{"url":"https://example.com","custom_id":"site1"}]
formats
array
default:"['markdown']"
Output formats for all URLs
country
string
Country code for location-specific scraping
wait_before_scraping
number
Wait time in milliseconds for JavaScript rendering
parser
string
Optional parser ID for specialized extraction
Response:
  • batch_id - Batch ID (use this to retrieve results later)
  • status - Processing status
  • object - Object type (“batch”)
Example Usage:
const result = await mastra.callApi({
  integrationName: 'olostep',
  api: 'batchScrape',
  payload: {
    data: {
      apiKey: process.env.OLOSTEP_API_KEY,
      batch_array: [
        { url: 'https://example.com', custom_id: 'site1' },
        { url: 'https://test.com', custom_id: 'site2' },
      ],
      formats: ['markdown'],
    }
  }
});

createCrawl

Autonomously discover and scrape entire websites by following links. Perfect for documentation sites, blogs, and content repositories. Use Cases:
  • Crawl and archive entire documentation sites
  • Extract all blog posts from a website
  • Build knowledge bases from web content
  • Monitor website structure changes
Schema Parameters:
apiKey
string
required
Your Olostep API key
start_url
string
required
Starting URL for the crawl (must include http:// or https://)
max_pages
number
default:"10"
Maximum number of pages to crawl
Whether to follow links found on pages
formats
array
default:"['markdown']"
Format for scraped content
country
string
Optional country code for location-specific crawling
parser
string
Optional parser ID for specialized content extraction
Response:
  • id - Crawl ID (use this to retrieve results later)
  • object - Object type (“crawl”)
  • status - Crawl status
  • created - Unix timestamp
Example Usage:
const result = await mastra.callApi({
  integrationName: 'olostep',
  api: 'createCrawl',
  payload: {
    data: {
      apiKey: process.env.OLOSTEP_API_KEY,
      start_url: 'https://docs.example.com',
      max_pages: 50,
      follow_links: true,
      formats: ['markdown'],
    }
  }
});

createMap

Extract all URLs from a website for content discovery and site structure analysis. Use Cases:
  • Build sitemaps and site structure diagrams
  • Discover all pages before batch scraping
  • Find broken or missing pages
  • SEO audits and analysis
Schema Parameters:
apiKey
string
required
Your Olostep API key
url
string
required
Website URL to extract links from (must include http:// or https://)
search_query
string
Optional search query to filter URLs (e.g., “blog”)
top_n
number
Limit the number of URLs returned
include_urls
array
Glob patterns to include specific paths (e.g., [“/blog/**”])
exclude_urls
array
Glob patterns to exclude specific paths (e.g., [“/admin/**”])
Response:
  • id - Map ID
  • object - Object type (“map”)
  • url - Website URL
  • total_urls - Total URLs found
  • urls - Array of discovered URLs
Example Usage:
const result = await mastra.callApi({
  integrationName: 'olostep',
  api: 'createMap',
  payload: {
    data: {
      apiKey: process.env.OLOSTEP_API_KEY,
      url: 'https://example.com',
      search_query: 'blog',
      top_n: 100,
      include_urls: ['/blog/**'],
    }
  }
});

Using with Agents

Basic Agent Example

Create an agent that can scrape websites:
import { Agent } from '@mastra/core';
import { createOlostepIntegration } from '@olostep/mastra-tools';

const olostep = createOlostepIntegration();
olostep.registerApis();

const agent = new Agent({
  name: 'web-researcher',
  instructions: `
    You are a web research assistant. When users ask you to get information from a website,
    use the Olostep scrapeWebsite API to extract the content, then summarize it for them.
  `,
  model: 'openai/gpt-4',
});

// The agent can now use Olostep APIs through Mastra's API system

Agent Workflow Example

Build a research workflow that discovers and scrapes content:
// 1. Map a website to discover URLs
const mapResult = await mastra.callApi({
  integrationName: 'olostep',
  api: 'createMap',
  payload: {
    data: {
      apiKey: process.env.OLOSTEP_API_KEY,
      url: 'https://example.com',
      include_urls: ['/blog/**'],
    }
  }
});

// 2. Batch scrape discovered URLs
const batchResult = await mastra.callApi({
  integrationName: 'olostep',
  api: 'batchScrape',
  payload: {
    data: {
      apiKey: process.env.OLOSTEP_API_KEY,
      batch_array: mapResult.urls.slice(0, 10).map(url => ({ url })),
      formats: ['markdown'],
    }
  }
});

// 3. Process results with your agent
const summary = await agent.generate({
  messages: [{
    role: 'user',
    content: `Summarize this content: ${batchResult.result.markdown_content}`
  }]
});

Research Agent

Build an agent that autonomously researches topics:
Workflow:
  1. User asks: “Research AI trends”
  2. Agent uses createMap to discover relevant pages
  3. Agent uses batchScrape to extract content
  4. Agent analyzes and summarizes findings
  5. Returns structured research report
Workflow:
  1. Schedule daily monitoring
  2. Use scrapeWebsite to check competitor pages
  3. Compare with previous data
  4. Alert on significant changes
  5. Generate weekly reports
Workflow:
  1. Use createCrawl to discover all blog posts
  2. Use batchScrape to extract content
  3. Process with AI to extract key topics
  4. Store in knowledge base
  5. Generate content calendar

E-commerce Intelligence

Monitor products and prices:
Agent Workflow:
1. Scrape product pages (scrapeWebsite)
2. Extract structured data (with parser)
3. Track price changes
4. Generate alerts
5. Update database

SEO Analysis

Analyze website structure and content:
Agent Workflow:
1. Map website structure (createMap)
2. Crawl important sections (createCrawl)
3. Analyze content quality
4. Identify SEO opportunities
5. Generate recommendations

Specialized Parsers

Olostep provides pre-built parsers for popular websites. Use them with the parser parameter:

Amazon Product

@olostep/amazon-productExtract: title, price, rating, reviews, images, variants

LinkedIn Profile

@olostep/linkedin-profileExtract: name, title, company, location, experience

LinkedIn Company

@olostep/linkedin-companyExtract: company info, employee count, industry, description

Google Search

@olostep/google-searchExtract: search results, titles, snippets, URLs

Google Maps

@olostep/google-mapsExtract: business info, reviews, ratings, location

Instagram Profile

@olostep/instagram-profileExtract: profile info, followers, posts, bio

Using Parsers

Add the parser ID to the parser parameter:
const result = await mastra.callApi({
  integrationName: 'olostep',
  api: 'scrapeWebsite',
  payload: {
    data: {
      apiKey: process.env.OLOSTEP_API_KEY,
      url_to_scrape: 'https://www.amazon.com/dp/PRODUCT_ID',
      formats: ['json'],
      parser: '@olostep/amazon-product',
    }
  }
});
The parser automatically extracts structured data specific to that website type.

Best Practices

When scraping more than 3-5 URLs, use batchScrape instead of multiple scrapeWebsite calls. Batch processing is:
  • Much faster (parallel processing)
  • More cost-effective
  • Easier to manage
  • Better for rate limits
For JavaScript-heavy sites, use the wait_before_scraping parameter:
  • Simple sites: 0-1000ms
  • Dynamic sites: 2000-3000ms
  • Heavy JavaScript: 5000-8000ms
Test with different values to find the optimal wait time.
For popular websites (Amazon, LinkedIn, Google), use pre-built parsers:
  • Get structured data automatically
  • More reliable extraction
  • No need for custom parsing
  • Maintained by Olostep
Batch, Crawl, and Map operations are asynchronous:
  • Store the returned ID (batch_id, crawl_id, map_id)
  • Poll for completion or use webhooks
  • Set up separate workflows for retrieval
Always wrap API calls in try-catch blocks:
try {
  const result = await mastra.callApi({
    integrationName: 'olostep',
    api: 'scrapeWebsite',
    payload: { data: {...} }
  });
} catch (error) {
  // Handle authentication, rate limit, or network errors
  console.error('Scraping failed:', error.message);
}
Be mindful of rate limits:
  • Space out requests with delays
  • Use batch processing when possible
  • Monitor usage in Olostep dashboard
  • Upgrade plan if needed

Complete Example

Here’s a complete example of building a research agent:
import { Mastra } from '@mastra/core';
import { Agent } from '@mastra/core';
import { createOlostepIntegration } from '@olostep/mastra-tools';

// Create and register Olostep integration
const olostep = createOlostepIntegration();
olostep.registerApis();

// Initialize Mastra
export const mastra = new Mastra({
  config: {
    integrations: [olostep],
    // ... other config
  },
});

// Create research agent
const researchAgent = new Agent({
  name: 'research-assistant',
  instructions: `
    You are a research assistant that can search, extract, and structure web data.
    When users ask you to research a topic:
    1. Use Olostep's createMap to discover relevant pages
    2. Use batchScrape to extract content from multiple sources
    3. Analyze and summarize the findings
    4. Present structured research reports
  `,
  model: 'openai/gpt-4',
});

// Use the agent
async function researchTopic(topic: string) {
  // Step 1: Discover relevant pages
  const mapResult = await mastra.callApi({
    integrationName: 'olostep',
    api: 'createMap',
    payload: {
      data: {
        apiKey: process.env.OLOSTEP_API_KEY!,
        url: `https://example.com/search?q=${topic}`,
        top_n: 20,
      }
    }
  });

  // Step 2: Scrape discovered pages
  const batchResult = await mastra.callApi({
    integrationName: 'olostep',
    api: 'batchScrape',
    payload: {
      data: {
        apiKey: process.env.OLOSTEP_API_KEY!,
        batch_array: mapResult.urls.slice(0, 10).map(url => ({ url })),
        formats: ['markdown'],
      }
    }
  });

  // Step 3: Analyze with agent
  const summary = await researchAgent.generate({
    messages: [{
      role: 'user',
      content: `Based on this research data, provide a comprehensive summary of ${topic}`
    }]
  });

  return summary;
}

Troubleshooting

Error: “Invalid API key”Solutions:
  • Check API key from dashboard
  • Ensure API key is set in environment variable
  • Verify API key is active
  • Check for extra spaces in API key
Error: “API not found” or “Integration not registered”Solutions:
  • Ensure registerApis() is called after creating integration
  • Verify integration is added to Mastra config
  • Check integration name is ‘olostep’
  • Restart Mastra server after changes
Error: Content fields are emptySolutions:
  • Increase wait_before_scraping time
  • Check if website requires login
  • Try different format (HTML vs Markdown)
  • Verify URL is accessible
  • Check if site blocks automated access
Error: “Rate limit exceeded”Solutions:
  • Space out requests with delays
  • Use batch processing instead of individual scrapes
  • Upgrade your Olostep plan
  • Check rate limit in dashboard
Error: Module not found or type errorsSolutions:
  • Ensure @mastra/core is installed
  • Check TypeScript version compatibility
  • Verify all dependencies are installed
  • Rebuild: npm run build

Pricing

Olostep charges based on API usage, independent of Mastra:
  • Scrapes: Pay per scrape
  • Batches: Pay per URL in batch
  • Crawls: Pay per page crawled
  • Maps: Pay per map operation
Check current pricing at olostep.com/pricing.

Support

Need help with the Mastra integration?

Get Started

Ready to build AI agents with web scraping capabilities?

Install Package

Install @olostep/mastra-tools from npm
Build intelligent AI agents that can search, extract, and structure web data with Olostep and Mastra!