parquet_mcp_server logo

parquet_mcp_server

by DeepSpringAI

A powerful MCP server for manipulating and analyzing Parquet files, designed to work with Claude Desktop. It offers functionalities like text embedding generation, Parquet file analysis, and integration with DuckDB and PostgreSQL.

View on GitHub

Last updated: N/A

What is parquet_mcp_server?

The parquet_mcp_server is an MCP (Model Control Protocol) server that provides tools for manipulating and analyzing Parquet files. It integrates with Claude Desktop and offers functionalities such as generating text embeddings, analyzing Parquet file metadata, converting Parquet files to DuckDB databases, converting Parquet files to PostgreSQL tables with pgvector support, and processing Markdown files into structured chunks.

How to use parquet_mcp_server?

To use the server, install it via Smithery or by cloning the repository and setting up the environment. Configure Claude Desktop to use the server by adding it to the claude_desktop_config.json file. Then, use the available tools by sending appropriate prompts to the agent, specifying the required parameters for each tool.

Key features of parquet_mcp_server

  • Text Embedding Generation using Ollama models

  • Parquet File Analysis (schema, row count, file size)

  • DuckDB Integration for efficient querying

  • PostgreSQL Integration with pgvector support for vector similarity search

  • Markdown Processing to chunk text with metadata

Use cases of parquet_mcp_server

  • Data scientists working with large Parquet datasets

  • Applications requiring vector embeddings for text data

  • Projects needing to analyze or convert Parquet files

  • Workflows that benefit from DuckDB's fast querying capabilities

  • Applications requiring vector similarity search with PostgreSQL and pgvector

FAQ from parquet_mcp_server

How do I install the Parquet MCP Server?

You can install it via Smithery using the command npx -y @smithery/cli install @DeepSpringAI/parquet_mcp_server --client claude or by cloning the repository and following the installation instructions in the README.

What environment variables are required?

You need to create a .env file with variables such as EMBEDDING_URL, OLLAMA_URL, EMBEDDING_MODEL, POSTGRES_DB, POSTGRES_USER, POSTGRES_PASSWORD, POSTGRES_HOST, and POSTGRES_PORT.

How do I configure Claude Desktop to use this server?

Add the server configuration to your claude_desktop_config.json file, specifying the command and arguments to run the server.

What are the available tools?

The server provides tools for embedding Parquet files, getting Parquet file information, converting to DuckDB, converting to PostgreSQL, and processing Markdown files.

What do I do if embeddings are not generated?

Check that the Ollama server is running and accessible, the specified model is available, and the text column exists in your input Parquet file.