mcp-server-webcrawl logo

mcp-server-webcrawl

by pragmar

mcp-server-webcrawl bridges the gap between your web crawl and AI language models using Model Context Protocol (MCP). It allows your AI client to filter and analyze web content under your direction or autonomously.

View on GitHub

Last updated: N/A

What is mcp-server-webcrawl?

mcp-server-webcrawl is a server that enables AI language models to access and analyze web content obtained from various web crawlers. It uses the Model Context Protocol (MCP) to provide a full-text search interface with filtering capabilities, allowing AI clients to search and analyze web data effectively.

How to use mcp-server-webcrawl?

  1. Install mcp-server-webcrawl using pip install mcp-server-webcrawl. 2. Configure the MCP settings in Claude Desktop (or your preferred MCP-compatible client) by specifying the command and arguments for mcp-server-webcrawl, including the crawler type and the data source path. 3. Adjust the configuration based on your operating system (macOS requires the absolute path to the executable). 4. Start the server and use your AI client to query and analyze the web content.

Key features of mcp-server-webcrawl

  • Claude Desktop ready

  • Fulltext search support

  • Filter by type, status, and more

  • Multi-crawler compatible

Use cases of mcp-server-webcrawl

  • Analyzing web content for research

  • Building AI-powered knowledge bases

  • Automated content summarization

  • Content filtering and classification

FAQ from mcp-server-webcrawl

What crawlers are supported?

mcp-server-webcrawl supports WARC, wget, InterroBot, Katana, and SiteOne.

How do I configure the data source?

The datasrc argument in the MCP configuration should point to the parent directory of the crawl data, or the database file depending on the crawler.

Is macOS configuration different?

Yes, macOS requires the absolute path to the mcp-server-webcrawl executable in the MCP configuration.

Does it support ChatGPT?

ChatGPT support is coming soon.

What is Model Context Protocol (MCP)?

Model Context Protocol (MCP) is a protocol that allows AI models to access external data sources, such as web crawls, to enhance their knowledge and capabilities.