Patronus MCP Server
by patronus-ai
The Patronus MCP Server provides a standardized interface for running powerful LLM system optimizations, evaluations, and experiments using the Patronus SDK. It allows users to easily evaluate, experiment, and optimize their LLM systems.
Last updated: N/A
What is Patronus MCP Server?
The Patronus MCP Server is an implementation of a Model Context Protocol (MCP) server designed to work with the Patronus SDK. It provides a standardized interface for performing LLM system evaluations, optimizations, and experiments.
How to use Patronus MCP Server?
To use the server, first clone the repository and install the dependencies. Then, run the server with your Patronus API key provided either as a command-line argument or an environment variable. You can then use the provided API endpoints to initialize Patronus, run single or batch evaluations, and run experiments with datasets. Example code snippets are provided in the README for each API endpoint.
Key features of Patronus MCP Server
Initialize Patronus with API key and project settings
Run single evaluations with configurable evaluators
Run batch evaluations with multiple evaluators
Run experiments with datasets
Use cases of Patronus MCP Server
Evaluating the performance of different LLM models
Optimizing LLM system prompts and parameters
Experimenting with different evaluation criteria
Automating LLM evaluation workflows
FAQ from Patronus MCP Server
How do I provide my Patronus API key?
How do I provide my Patronus API key?
You can provide your API key either as a command-line argument when running the server or as an environment variable named PATRONUS_API_KEY
.
How do I run a single evaluation?
How do I run a single evaluation?
Use the evaluate
endpoint with a properly formatted EvaluationRequest
object. The request should include the evaluator configuration, task input, task output, and task context.
How do I run a batch evaluation?
How do I run a batch evaluation?
Use the batch_evaluate
endpoint with a BatchEvaluationRequest
object. This allows you to run multiple evaluators on the same task input, output, and context.
How do I run an experiment?
How do I run an experiment?
Use the run_experiment
endpoint with an ExperimentRequest
object. This allows you to run evaluations on a dataset using a combination of remote and custom evaluators.
How do I add a new feature to the server?
How do I add a new feature to the server?
Define a new request model, implement a new tool function with the @mcp.tool()
decorator, add corresponding tests, and update the README with the new feature description and API usage example.