MCP Server Readability Parser

by MCP-Mirror

Web/Scraping & Content Extraction Readability Python Markdown Web Scraping Content Extraction

A Python implementation of the MCP server that extracts and transforms webpage content into clean, LLM-optimized Markdown. It removes ads, navigation, and other non-essential content for better LLM processing.

View on GitHub

Last updated: N/A

What is MCP Server Readability Parser?

This is a Python based MCP server that utilizes the Readability algorithm to extract the main content from a webpage, cleans it, and converts it into Markdown format. It is optimized for use with Large Language Models (LLMs).

How to use MCP Server Readability Parser?

Clone the repository. 2. Create and activate a virtual environment. 3. Install dependencies using pip install -r requirements.txt. 4. Start the server using fastmcp run server.py. 5. Send a POST request to the /tools/extract_content endpoint with a JSON payload containing the URL to parse.

Key features of MCP Server Readability Parser

Removes ads, navigation, footers and other non-essential content
Converts clean HTML into well-formatted Markdown
Handles errors gracefully
Optimized for LLM processing

Use cases of MCP Server Readability Parser

Preparing web content for LLM training
Summarizing articles for research
Creating clean content for documentation
Automating content extraction from websites

FAQ from MCP Server Readability Parser

Why use this instead of just fetching the HTML?

This server extracts only relevant content using the Readability algorithm, eliminates noise like ads, popups, and navigation menus, reduces token usage, and provides consistent Markdown formatting.

What is the main tool provided by the server?

The main tool is extract_content, which fetches and transforms webpage content into clean Markdown.

What arguments does the `extract_content` tool accept?

It accepts a url argument, which is a string representing the website URL to parse.

What does the `extract_content` tool return?

It returns a JSON object with a content field, which contains the Markdown content extracted from the webpage.

How do I configure this server with MCP?

Add the provided JSON configuration to your MCP settings file, specifying the command and arguments to run the server.