kv-extractor-mcp-server

by KunihiroS

Data/Text Analysis key-value extraction GPT-4 pydantic text processing MCP server

This MCP server extracts key-value pairs from arbitrary, noisy, or unstructured text using LLMs (GPT-4.1-mini) and pydantic-ai. It ensures type safety and supports multiple output formats (JSON, YAML, TOML).

View on GitHub

Last updated: N/A

What is kv-extractor-mcp-server?

kv-extractor-mcp-server is an MCP server designed to extract key-value pairs from unstructured text. It leverages LLMs, specifically GPT-4.1-mini, and pydantic-ai to provide type-safe and structured output in JSON, YAML, or TOML formats. It excels at automatically identifying and extracting relevant key-value pairs from unstructured text without requiring pre-defined keys, making it highly effective for diverse and unpredictable data.

How to use kv-extractor-mcp-server?

To use the server, install it via Smithery or manually by cloning the repository. Ensure you have Python 3.9+ and an OpenAI API key. Configure the API key in settings.json. Run the server using python server.py. You must explicitly specify the log output mode and (if enabled) the absolute log file path via command-line arguments. Use the provided tools (/extract_json, /extract_yaml, /extract_toml) with input text to extract key-value pairs in the desired format.

Key features of kv-extractor-mcp-server

Automatic Key Discovery
Superior Robustness for Complex Inputs
Advanced Multi-Lingual Preprocessing (Japanese, English, Chinese)
Iterative Refinement and Typing
Guaranteed Type Safety and Schema Adherence
Consistent and Predictable Output

Use cases of kv-extractor-mcp-server

Extracting data from invoices
Processing customer feedback
Analyzing log files
Structuring information from research papers

FAQ from kv-extractor-mcp-server

What languages are supported?

Japanese, English, and Chinese (Simplified/Traditional) are fully supported using spaCy NER. Other languages will result in an error.

What models are used for extraction?

GPT-4.1-mini is used for key-value extraction, type annotation, and type evaluation.

What output formats are supported?

JSON, YAML, and TOML are supported. Note that TOML has limitations with nested structures and arrays of objects, which will be represented as JSON strings.

Is perfect extraction guaranteed?

No, extraction relies on pydantic-ai and LLMs. Perfect extraction is not guaranteed, but the server is designed to be robust and handle errors gracefully.

How do I configure logging?

You must explicitly specify the log output mode and (if enabled) the absolute log file path via command-line arguments. Use --log=off to disable logging or --log=on --logfile=/absolute/path/to/logfile.log to enable logging.