PDF Reader MCP Server
by sylphlab
Empower your AI agents with the ability to securely read and extract information from PDF files. This tool provides a single, flexible solution for extracting text, metadata, and page counts within your project context.
Last updated: N/A
What is PDF Reader MCP Server?
A server designed to securely read and extract information (text, metadata, page count) from PDF files for use by AI agents within a defined project context. It is designed for integration with MCP (Microservice Communication Protocol) environments.
How to use PDF Reader MCP Server?
The server can be installed via npm or Docker. Configure your MCP host to use the server, providing the necessary command and arguments (e.g., using npx
or running a Docker container). Send MCP requests with the tool_name
as read_pdf
and arguments specifying the PDF source(s), desired extraction options (pages, metadata, etc.).
Key features of PDF Reader MCP Server
Secure file access confined to the project root directory.
Handles both local relative paths and public URLs.
Consolidated
read_pdf
tool for multiple extraction needs.Structured JSON output for easy parsing by AI agents.
Easy integration within MCP environments via
npx
or Docker.Uses
pdfjs-dist
for reliable parsing and Zod for input validation.
Use cases of PDF Reader MCP Server
Enabling AI agents to automatically extract key information from reports.
Building knowledge bases from PDF documents.
Automating data entry from PDF forms.
Analyzing PDF content for sentiment or topic modeling.
FAQ from PDF Reader MCP Server
How does the server ensure security?
How does the server ensure security?
File access is strictly confined to the project root directory, preventing unauthorized access to other files.
Can I process PDFs from URLs?
Can I process PDFs from URLs?
Yes, the server supports processing PDFs from both local paths and public URLs.
What kind of metadata can I extract?
What kind of metadata can I extract?
You can extract metadata such as author, title, creation date, and other document properties.
How do I specify which pages to extract text from?
How do I specify which pages to extract text from?
Use the pages
array in the MCP request arguments to specify the desired page numbers.
What if I want to extract the entire text from the PDF?
What if I want to extract the entire text from the PDF?
Omit the 'pages' argument from the MCP request. Make sure to set 'include_full_text' to true.