PDF Reader MCP Server

by sylphlab

AI/Development Tools PDF AI Agents Text Extraction Metadata Docker

Empower your AI agents with the ability to securely read and extract information from PDF files. This tool provides a single, flexible solution for extracting text, metadata, and page counts within your project context.

View on GitHub

Last updated: N/A

What is PDF Reader MCP Server?

A server designed to securely read and extract information (text, metadata, page count) from PDF files for use by AI agents within a defined project context. It is designed for integration with MCP (Microservice Communication Protocol) environments.

How to use PDF Reader MCP Server?

The server can be installed via npm or Docker. Configure your MCP host to use the server, providing the necessary command and arguments (e.g., using npx or running a Docker container). Send MCP requests with the tool_name as read_pdf and arguments specifying the PDF source(s), desired extraction options (pages, metadata, etc.).

Key features of PDF Reader MCP Server

Secure file access confined to the project root directory.
Handles both local relative paths and public URLs.
Consolidated read_pdf tool for multiple extraction needs.
Structured JSON output for easy parsing by AI agents.
Easy integration within MCP environments via npx or Docker.
Uses pdfjs-dist for reliable parsing and Zod for input validation.

Use cases of PDF Reader MCP Server

Enabling AI agents to automatically extract key information from reports.
Building knowledge bases from PDF documents.
Automating data entry from PDF forms.
Analyzing PDF content for sentiment or topic modeling.

FAQ from PDF Reader MCP Server

How does the server ensure security?

File access is strictly confined to the project root directory, preventing unauthorized access to other files.

Can I process PDFs from URLs?

Yes, the server supports processing PDFs from both local paths and public URLs.

What kind of metadata can I extract?

You can extract metadata such as author, title, creation date, and other document properties.

How do I specify which pages to extract text from?

Use the pages array in the MCP request arguments to specify the desired page numbers.

What if I want to extract the entire text from the PDF?

Omit the 'pages' argument from the MCP request. Make sure to set 'include_full_text' to true.