mcp-server-webcrawl logo

mcp-server-webcrawl

by pragmar

mcp-server-webcrawl bridges the gap between your web crawl and AI language models using Model Context Protocol (MCP). It allows your AI client to filter and analyze web content under your direction or autonomously.

View on GitHub

Last updated: N/A

What is mcp-server-webcrawl?

mcp-server-webcrawl is a server that enables AI language models to access and analyze web content obtained from various web crawlers using the Model Context Protocol (MCP). It provides a full-text search interface and filtering capabilities for efficient content retrieval.

How to use mcp-server-webcrawl?

Install mcp-server-webcrawl using pip install mcp_server_webcrawl. Configure the MCP settings in Claude Desktop by editing the configuration file and specifying the command and arguments for your chosen web crawler (wget, WARC, InterroBot, Katana, or SiteOne). The datasrc argument should point to the directory containing the crawled data.

Key features of mcp-server-webcrawl

  • Claude Desktop ready

  • Fulltext search support

  • Filter by type, status, and more

  • Multi-crawler compatible

Use cases of mcp-server-webcrawl

  • AI-powered content analysis

  • Intelligent web search

  • Automated information extraction

  • Contextualized responses from LLMs

FAQ from mcp-server-webcrawl

What crawlers are supported?

WARC, wget, InterroBot, Katana, and SiteOne are currently supported.

What is the Model Context Protocol (MCP)?

MCP is a protocol that enables communication between AI language models and external data sources, such as web crawls.

How do I configure the server for a specific crawler?

You need to edit the MCP configuration file in Claude Desktop and specify the correct command-line arguments, including the --crawler and --datasrc options.

Where do I find the MCP configuration file?

From the Claude Desktop menu, navigate to File > Settings > Developer. Click Edit Config to locate the configuration file.

Is ChatGPT supported?

ChatGPT support is coming soon.