MCP Web Extractor

by iemong

Web/Scraping & Content Extraction web extraction readability obsidian content extraction

MCP Web Extractor is a Model Context Protocol (MCP) server that extracts web content using Readability.js. It fetches web pages and extracts the main content, making it ideal for saving clean, readable versions of articles.

View on GitHub

Last updated: N/A

What is MCP Web Extractor?

A Model Context Protocol (MCP) server that extracts web content from URLs using Readability.js, providing clean, readable versions of articles.

How to use MCP Web Extractor?

Clone the repository. 2. Install dependencies using npm install. 3. Build the project using npm run build. 4. Start the server using npm start. You can then use the client example or integrate it with Obsidian.

Key features of MCP Web Extractor

Extracts readable content from any URL
Removes ads, sidebars, and other distractions
Returns clean text along with metadata (title, excerpt, etc.)
Easy integration with Obsidian via MCP

Use cases of MCP Web Extractor

Saving clean, readable versions of articles
Creating Obsidian notes from web content
Building an Obsidian plugin for web content extraction
Extracting content for research and analysis

FAQ from MCP Web Extractor

What is Readability.js?

Readability.js is a library that extracts the main content from a web page, removing clutter like ads and navigation.

What is MCP?

MCP stands for Model Context Protocol. It's used here to facilitate communication between the server and other applications like Obsidian.

How do I integrate this with Obsidian?

The obsidian-integration.ts file provides an example of how to integrate this MCP server with Obsidian. You can use it as a starting point for creating an Obsidian plugin.

What data does the server return?

The server returns the title, content, textContent, excerpt, and siteName of the extracted web page.

Where does the server run by default?

The server starts on http://localhost:3000 with the MCP endpoint at http://localhost:3000/mcp.