MCP Web Extractor logo

MCP Web Extractor

by iemong

MCP Web Extractor is a Model Context Protocol (MCP) server that extracts web content using Readability.js. It fetches web pages and extracts the main content, making it ideal for saving clean, readable versions of articles.

View on GitHub

Last updated: N/A

What is MCP Web Extractor?

A Model Context Protocol (MCP) server that extracts web content from URLs using Readability.js, providing clean, readable versions of articles.

How to use MCP Web Extractor?

  1. Clone the repository. 2. Install dependencies using npm install. 3. Build the project using npm run build. 4. Start the server using npm start. You can then use the client example or integrate it with Obsidian.

Key features of MCP Web Extractor

  • Extracts readable content from any URL

  • Removes ads, sidebars, and other distractions

  • Returns clean text along with metadata (title, excerpt, etc.)

  • Easy integration with Obsidian via MCP

Use cases of MCP Web Extractor

  • Saving clean, readable versions of articles

  • Creating Obsidian notes from web content

  • Building an Obsidian plugin for web content extraction

  • Extracting content for research and analysis

FAQ from MCP Web Extractor

What is Readability.js?

Readability.js is a library that extracts the main content from a web page, removing clutter like ads and navigation.

What is MCP?

MCP stands for Model Context Protocol. It's used here to facilitate communication between the server and other applications like Obsidian.

How do I integrate this with Obsidian?

The obsidian-integration.ts file provides an example of how to integrate this MCP server with Obsidian. You can use it as a starting point for creating an Obsidian plugin.

What data does the server return?

The server returns the title, content, textContent, excerpt, and siteName of the extracted web page.

Where does the server run by default?

The server starts on http://localhost:3000 with the MCP endpoint at http://localhost:3000/mcp.