docs-mcp-server

by arabold

Development/Documentation documentation scraping indexing search embeddings SQLite vector search full-text search

The docs-mcp-server is a Model Context Protocol (MCP) server designed to scrape, process, index, and search documentation for various software libraries and packages. It fetches content from specified URLs, splits it into meaningful chunks, generates vector embeddings, and stores the data in an SQLite database.

View on GitHub

Last updated: N/A

What is docs-mcp-server?

This project provides a Model Context Protocol (MCP) server that scrapes, processes, indexes, and searches documentation for software libraries and packages. It fetches content, splits it semantically, generates vector embeddings, and stores the data in an SQLite database for efficient hybrid search.

How to use docs-mcp-server?

The server can be run using Docker or npx. Configure the MCP settings with the appropriate command and arguments, providing necessary environment variables like the OpenAI API key. Use the MCP tools to start scraping jobs, check job status, list jobs, cancel jobs, search documentation, list indexed libraries, find appropriate versions, remove indexed documents, and fetch single URLs.

Key features of docs-mcp-server

Versatile Scraping: Fetch documentation from diverse sources.
Intelligent Processing: Automatically split content and generate embeddings.
Optimized Storage: Leverage SQLite with sqlite-vec and FTS5.
Powerful Hybrid Search: Combine vector similarity and full-text search.
Asynchronous Job Handling: Manage scraping and indexing tasks efficiently.
Simple Deployment: Get up and running quickly using Docker or npx.

Use cases of docs-mcp-server

Providing documentation search for AI assistants.
Indexing and searching documentation for internal software libraries.
Creating a searchable archive of documentation for different versions of a library.
Fetching and converting single URLs to Markdown for use in other applications.

FAQ from docs-mcp-server

What embedding models are supported?

The server supports OpenAI, Google Gemini, Azure OpenAI, AWS Bedrock, and Ollama embedding models. You need to configure the appropriate environment variables for each provider.

How do I configure the embedding model?

Use the DOCS_MCP_EMBEDDING_MODEL environment variable to specify the provider and model name. You also need to set the required API keys or credentials for the chosen provider.

How do I run the server?

You can run the server using Docker (recommended) or npx. Docker provides a straightforward deployment, while npx is suitable for local file access.

How do I persist the indexed documentation?

When using Docker, mount a Docker named volume or a host directory to the /data directory inside the container. This ensures that the database is persisted even if the container is stopped or removed.

How do I use the CLI?

Use the docs-cli command via Docker or npx, depending on how you are running the server. The CLI provides commands for scraping, searching, and managing the documentation index.