MCP Website Downloader
by angrysky56
MCP Website Downloader is a simple MCP server designed for downloading documentation websites and preparing them for Retrieval-Augmented Generation (RAG) indexing. It aims to download complete documentation sites and organize assets for use with RAG systems.
Last updated: N/A
What is MCP Website Downloader?
MCP Website Downloader is an MCP server that downloads documentation websites and prepares them for RAG indexing. It downloads website content, organizes assets, and creates an index for RAG systems.
How to use MCP Website Downloader?
- Fork and download the repository.
- Install dependencies using
uv venv
,./venv/Scripts/activate
, andpip install -e .
. - Configure the server in your
claude_desktop_config.json
file with the appropriate paths. - (Optional) Start the server using
python -m mcp_windows_website_downloader.server --library docs_library
. - Use through Claude Desktop or other MCP clients by calling the 'download' tool with a URL.
Key features of MCP Website Downloader
Downloads complete documentation sites
Maintains link structure and navigation (partially)
Downloads and organizes assets (CSS, JS, images)
Creates an index for RAG systems
Simple single-purpose MCP interface
Use cases of MCP Website Downloader
Preparing documentation websites for RAG-based question answering systems
Creating local copies of documentation for offline access
Building custom knowledge bases from online documentation
Automating the process of extracting information from websites for AI applications
FAQ from MCP Website Downloader
What is the purpose of the rag_index.json file?
What is the purpose of the rag_index.json file?
The rag_index.json
file contains metadata about the downloaded website, including the URL, domain, number of pages, and path to the downloaded site. This information can be used by RAG systems to index and retrieve relevant content.
What kind of error handling does the server have?
What kind of error handling does the server have?
The server handles common issues such as invalid URLs, network errors, asset download failures, malformed HTML, deep recursion, and file system errors. It returns error responses in JSON format with a detailed error message.
How does the server handle asset downloads?
How does the server handle asset downloads?
The server downloads and organizes assets such as CSS, JS, and images. It attempts to maintain the original site structure and organizes assets by type.
What is the MCP architecture of the server?
What is the MCP architecture of the server?
The server follows a standard MCP architecture with separate modules for the server implementation (server.py
), core downloader functionality (core.py
), and helper utilities (utils.py
).
How can I contribute to the project?
How can I contribute to the project?
You can contribute by forking the repository, creating a feature branch, making your changes, and submitting a pull request.