PDF Reader MCP Server

by sylphlab

Utilities pdf reader text extraction metadata

Empower your AI agents with the ability to securely read and extract information from PDF files within your project context using a single, flexible tool. It allows extracting text, metadata, and page count.

View on GitHub

Last updated: N/A

What is PDF Reader MCP Server?

The PDF Reader MCP Server is a tool designed to enable AI agents to securely read and extract information from PDF files. It provides a single, flexible interface for extracting text, metadata, and page counts.

How to use PDF Reader MCP Server?

The server can be installed via npm or Docker. Configuration involves setting up your MCP host to use either the npm package or the Docker image, specifying the correct command and arguments. Once configured, you can send MCP requests with the 'read_pdf' tool name and arguments specifying the PDF source, desired pages, and information to include.

Key features of PDF Reader MCP Server

Secure file access confined to the project root directory.
Handles both local relative paths and public URLs.
Consolidated tool for multiple extraction needs (full text, specific pages, metadata, page count).
Structured JSON output for easy parsing by agents.
Easy integration within MCP environments via npm or Docker.
Robust parsing using pdfjs-dist and Zod for input validation.

Use cases of PDF Reader MCP Server

Extracting text content from invoices for automated processing.
Retrieving metadata from research papers for knowledge base creation.
Analyzing specific pages of legal documents for relevant clauses.
Counting the number of pages in a document for billing purposes.

FAQ from PDF Reader MCP Server

How does the server ensure security?

The server confines file access strictly to the project root directory, preventing access to unauthorized files.

Can I extract data from PDFs hosted online?

Yes, the server can handle both local relative paths and public URLs for PDF sources.

What kind of metadata can I extract?

You can extract metadata such as author, title, creation date, and other document properties.

How do I specify which pages to extract text from?

You can use the 'pages' array in the MCP request arguments to specify the page numbers you want to extract.

What is the output format of the extracted data?

The server returns data in a predictable JSON format, making it easy for AI agents to parse and use.