/v1/scrapes endpoint you can extract LLM-friendly Markdown, HTML, text, screenshots, or structured JSON from any URL in real time.
- Outputs clean markdown, structured data, screenshots, or html
- Extract JSON through Parsers or LLM extraction
- Handles dynamic content: js-rendered sites, login flows via actions, PDFs
Scraping a URL
Use the/v1/scrapes endpoint to scrape a single URL and choose output formats.
Installation
Usage
You can use the endpoint to scrape a single URL and choose output formats. The mandatory parameters areurl_to_scrape and formats.
Some other common parameters are wait_before_scraping (in milliseconds), remove_css_selectors (default, none, or an array of selectors), and country.
Response
The API returns ascrape object in response.
The scrape has a few properties like id and result.
The result object has the following fields (according to the formats parameter some might be null):
html_content: the HTML content of the page. Passformats: ["html"]to get this.markdown_content: the MD content of the page. Passformats: ["markdown"]to get this.text_content: the text content of the page. Passformats: ["text"]to get this.json_content: the JSON content of the page. Passformats: ["json"]to get this and also provide aparserorllm_extractparameter.screenshot_hosted_url: the hosted URL of the screenshot.html_hosted_url: the hosted URL of the HTML contentmarkdown_hosted_url: the hosted URL of the Markdown contentjson_hosted_url: the hosted URL of the JSON contenttext_hosted_url: the hosted URL of the text contentlinks_on_page: the links on the pagepage_metadata: the metadata of the page
Scrape Formats
Choose one or more output formats viaformats:
markdown: LLM-friendly markdownhtml: cleaned HTMLtext: plain textjson: structured output (via parser or llm_extract)raw_pdf: raw PDF bytes extracted to hosted URLscreenshot: set via actions to capture a screenshot and return a hosted URL
result as *_content fields and a *_hosted_url as well.
Extract structured data
You can extract structured JSON in two ways: using Parsers or LLM extraction.Using a Parser (recommended for scale)
Defineformats: ["json"] and provide a parser id.
Using LLM extraction (schema and/or prompt)
Providellm_extract with a JSON Schema (schema) and/or a natural language instruction (prompt). You can pass both parameters, but if both are provided, schema takes precedence.
Instead, if you just pass a prompt, the LLM will extract the data based on the prompt and will decide the data structure on its own.
result.json_content returns a stringified JSON. Parse it in your code if you need an object.
Interacting with the page with Actions
Perform actions before scraping to interact with dynamic sites. Supported actions:waitwithmillisecondsclickwithselectorfill_inputwithselectorandvaluescrollwithdirectionandamount
wait before/after other actions to allow the page to load.
Example
markdown_content).
Use Cases
Below are a few practical applications of customers using the/scrapes endpoint.
Content Analysis & Research
- Competitive Analysis: Extract product details, pricing, and features from competitor websites
- Market Research: Analyze landing pages, product descriptions, and customer testimonials
- Academic Research: Gather specific data from scientific publications or research portals
- Legal Documentation: Extract case studies, regulations, or legal precedents from official websites
E-commerce & Retail
- Dynamic Pricing Strategies: Get real-time product pricing from competing stores
- Product Information Management: Extract detailed specifications and descriptions
- Stock/Inventory Monitoring: Check product availability at other retailers
- Review Analysis: Gather consumer feedback and sentiment for specific products
Marketing & Content Creation
- Content Curation: Extract relevant articles and blog posts for newsletters
- SEO Analysis: Examine competitors’ keyword usage, meta descriptions, and page structure
- Lead Generation: Extract contact information from business directories or company pages
- Influencer Research: Gather engagement metrics and content styles from influencer profiles
- Personalised Social Media generation: Create AI-powered social media marketing by analyzing customers websites
Data Applications
- AI Training Data Collection: Gather specific examples for machine learning models
- Custom Knowledge Base Building: Extract documentation or instructions from software sites
- Historical Data Archives: Preserve website content at specific points in time
- Structured Data Extraction: Transform web content into formatted datasets for analysis
Monitoring & Alerts
- Regulatory Compliance Monitoring: Track changes to legal or regulatory websites
- Crisis Management: Monitor news sites for mentions of specific events or organizations
- Event Tracking: Extract details about upcoming events from venue or organizer websites
- Service Status Monitoring: Check service status pages for specific platforms or tools
Publishing & Media
- News Aggregation: Extract breaking news from official sources
- Media Monitoring: Track specific topics across news sites
- Content Verification: Extract information to fact-check claims or statements
- Multimedia Extraction: Gather embedded videos, images, or audio for media libraries
Financial Applications
- Investment Research: Extract financial statements or annual reports from company websites
- Economic Indicators: Gather economic data from government or financial institution websites
- Cryptocurrency Data: Extract real-time pricing and market cap information
- Financial News Analysis: Monitor financial news sites for specific market signals
Technical Applications
- API Documentation Extraction: Gather technical documentation for reference
- Integration Testing: Extract website elements to verify third-party integrations
- Accessibility Testing: Analyze website structure for compliance with accessibility standards
- Web Archive Creation: Capture full website content for historical preservation
Integration Scenarios
- CRM Systems: Enhance customer profiles with data from company websites or Linkedin
- Content Management Systems: Import relevant external content
- Business Intelligence Tools: Supplement internal data with external market information
- Project Management Software: Extract specifications or requirements from client websites
- Custom Dashboards: Display extracted data alongside internal metrics