Patronus MCP Server

by patronus-ai

AI/Development Tools evaluation optimization experimentation SDK

The Patronus MCP Server provides a standardized interface for running powerful LLM system optimizations, evaluations, and experiments using the Patronus SDK. It allows users to easily evaluate, experiment, and optimize their LLM systems.

View on GitHub

Last updated: N/A

What is Patronus MCP Server?

The Patronus MCP Server is an implementation of a Model Context Protocol (MCP) server designed to work with the Patronus SDK. It provides a standardized interface for performing LLM system evaluations, optimizations, and experiments.

How to use Patronus MCP Server?

To use the server, first clone the repository and install the dependencies. Then, run the server with your Patronus API key provided either as a command-line argument or an environment variable. You can then use the provided API endpoints to initialize Patronus, run single or batch evaluations, and run experiments with datasets. Example code snippets are provided in the README for each API endpoint.

Key features of Patronus MCP Server

Initialize Patronus with API key and project settings
Run single evaluations with configurable evaluators
Run batch evaluations with multiple evaluators
Run experiments with datasets

Use cases of Patronus MCP Server

Evaluating the performance of different LLM models
Optimizing LLM system prompts and parameters
Experimenting with different evaluation criteria
Automating LLM evaluation workflows

FAQ from Patronus MCP Server

How do I provide my Patronus API key?

You can provide your API key either as a command-line argument when running the server or as an environment variable named PATRONUS_API_KEY.

How do I run a single evaluation?

Use the evaluate endpoint with a properly formatted EvaluationRequest object. The request should include the evaluator configuration, task input, task output, and task context.

How do I run a batch evaluation?

Use the batch_evaluate endpoint with a BatchEvaluationRequest object. This allows you to run multiple evaluators on the same task input, output, and context.

How do I run an experiment?

Use the run_experiment endpoint with an ExperimentRequest object. This allows you to run evaluations on a dataset using a combination of remote and custom evaluators.

How do I add a new feature to the server?

Define a new request model, implement a new tool function with the @mcp.tool() decorator, add corresponding tests, and update the README with the new feature description and API usage example.