Fetch MCP Server

by MaartenSmeets

Web/Scraping & Content Extraction web scraping browser automation OCR content extraction

Fetch MCP Server provides web content fetching capabilities using browser automation, OCR, and multiple extraction methods. This server enables LLMs to retrieve and process content from web pages, even those that require JavaScript rendering or use techniques that prevent simple scraping.

View on GitHub

Last updated: N/A

What is Fetch MCP Server?

Fetch MCP Server is a Model Context Protocol server designed to fetch and process web content. It employs browser automation, OCR, and various extraction techniques to retrieve content from web pages, even those that are difficult to scrape using traditional methods.

How to use Fetch MCP Server?

To use the server, you can install it using Docker. Configure your Claude settings to include the server with the provided Docker command. The server provides a 'fetch' tool that takes a URL as input and returns the extracted content as markdown. You can customize the user-agent by adding the --user-agent argument to the Docker run command.

Key features of Fetch MCP Server

Browser automation with undetected-chromedriver
OCR using pytesseract with layout detection
HTML extraction using requests/BeautifulSoup
Document parsing (PDF, DOCX, PPTX)
Sophisticated scoring system for best result selection

Use cases of Fetch MCP Server

Enabling LLMs to access and process web content
Retrieving content from JavaScript-heavy websites
Extracting text from images within web pages
Automating data collection from websites
Providing context to AI models from dynamic web sources

FAQ from Fetch MCP Server

What is the purpose of the scoring system?

The scoring system ensures that the most reliable and high-quality content is selected, regardless of the extraction method used. It considers content length, structure, and potential errors.

How do I customize the user-agent?

You can customize the user-agent by adding the argument --user-agent=YourUserAgent to the args list in the Docker configuration.

What extraction methods are used?

The server uses browser automation, OCR, HTML extraction, and document parsing to retrieve content.

How do I install the server?

The recommended installation method is using Docker. Build the Docker image and then run the container.

Where can I find examples of other MCP servers?

Examples of other MCP servers and implementation patterns can be found at https://github.com/modelcontextprotocol/servers