Get the Markdown of a Website
Learn how to extract content as LLM-friendly markdown from any web page.
Overview
Olostep’s scrape endpoint allows to extract content from any website. Content in markdown is useful if you want to feed it to an LLM without all the HTML.
In this guide we will see how to extract markdown from a website like https://www.nea.com/team
.
Prerequisites
Before getting started, ensure you have the following:
- A valid Olostep API key. You can get one by signing up at Olostep.
- Python installed on your system
- The
requests
andjson
libraries (these come pre-installed with Python, but you can install them usingpip install requests
if needed)
Extracting Text from a Website
The following Python script demonstrates how to extract text and markdown content from a website using Olostep’s API.
Example Response
A successful response will look something like this:
Explanation
url_to_scrape
: specifies the website URL to extract content from.formats
: defines the output formats (text in this case).Authorization
: contains your API key to authenticate the request.- The response is formatted as JSON and printed for readability.
Conclusion
Using Olostep, you can easily extract markdown content from any website. This is useful if you want to get content from a website and feed it to an LLM for data extraction and analysis. If you want to extract content at scale from the same website over and over (e.g. monitoring data, price tracking, etc…) we recommend using a custom parser to get the content in JSON format.