Context Optimizer MCP logo

Context Optimizer MCP

by degenhero

The Context Optimizer MCP is a server that optimizes and extends context windows for large chat histories using Redis and in-memory caching. It acts as a middleware between your application and LLM providers, managing conversation context efficiently.

View on GitHub

Last updated: N/A

Context Optimizer MCP

An MCP (Model Context Protocol) server that uses Redis and in-memory caching to optimize and extend context windows for large chat histories.

Features

  • Dual-Layer Caching: Combines fast in-memory LRU cache with persistent Redis storage
  • Smart Context Management: Automatically summarizes older messages to maintain context within token limits
  • Rate Limiting: Redis-based rate limiting with burst protection
  • API Compatibility: Drop-in replacement for Anthropic API with enhanced context handling
  • Metrics Collection: Built-in performance monitoring and logging

How It Works

This MCP server acts as a middleware between your application and LLM providers (currently supporting Anthropic's Claude models). It intelligently manages conversation context through these strategies:

  1. Context Window Optimization: When conversations approach the model's token limit, older messages are automatically summarized while preserving key information.

  2. Efficient Caching:

    • In-memory LRU cache for frequently accessed conversation summaries
    • Redis for persistent, distributed storage of conversation history and summaries
  3. Transparent Processing: The server handles all context management automatically while maintaining compatibility with the standard API.

Getting Started

Prerequisites

  • Node.js 18+
  • Redis server (local or remote)
  • Anthropic API key

Installation Options

1. Using MCP client

The easiest way to install and run this server is using the MCP client:

# Install via npx
npx mcp install degenhero/context-optimizer-mcp

# Or using uvx
uvx mcp install degenhero/context-optimizer-mcp

Make sure to set your Anthropic API key when prompted during installation.

2. Manual Installation
# Clone the repository
git clone https://github.com/degenhero/context-optimizer-mcp.git
cd context-optimizer-mcp

# Install dependencies
npm install

# Set up environment variables
cp .env.example .env
# Edit .env with your configuration

# Start the server
npm start
3. Using Docker
# Clone the repository
git clone https://github.com/degenhero/context-optimizer-mcp.git
cd context-optimizer-mcp

# Build and start with Docker Compose
docker-compose up -d

This will start both the MCP server and a Redis instance.

Configuration

Configure the server by editing the .env file:

# Server configuration
PORT=3000

# Anthropic API key
ANTHROPIC_API_KEY=your_anthropic_api_key

# Redis configuration
REDIS_HOST=localhost
REDIS_PORT=6379
REDIS_PASSWORD=

# Caching settings
IN_MEMORY_CACHE_MAX_SIZE=1000
REDIS_CACHE_TTL=86400  # 24 hours in seconds

# Model settings
DEFAULT_MODEL=claude-3-opus-20240229
DEFAULT_MAX_TOKENS=4096

API Usage

The server exposes a compatible API endpoint that works like the standard Claude API with additional context optimization features:

// Example client usage
const response = await fetch('http://localhost:3000/v1/messages', {
  method: 'POST',
  headers: {
    'Content-Type': 'application/json',
  },
  body: JSON.stringify({
    model: 'claude-3-opus-20240229',
    messages: [
      { role: 'user', content: 'Hello!' },
      { role: 'assistant', content: 'How can I help you today?' },
      { role: 'user', content: 'Tell me about context management.' }
    ],
    max_tokens: 1000,
    // Optional MCP-specific parameters:
    conversation_id: 'unique-conversation-id', // For context tracking
    context_optimization: true, // Enable/disable optimization
  }),
});

const result = await response.json();

Additional Endpoints

  • GET /v1/token-count?text=your_text&model=model_name: Count tokens in a text string
  • GET /health: Server health check
  • GET /metrics: View server performance metrics

Testing

A test script is included to demonstrate how context optimization works:

# Run the test script
npm run test:context

This will start an interactive session where you can have a conversation and see how the context gets optimized as it grows.

Advanced Features

Context Summarization

When a conversation exceeds 80% of the model's token limit, the server automatically summarizes older messages. This summarization is cached for future use.

Conversation Continuity

By providing a consistent conversation_id in requests, the server can maintain context across multiple API calls, even if individual requests would exceed token limits.

Performance Considerations

  • In-memory cache provides fastest access for active conversations
  • Redis enables persistence and sharing across server instances
  • Summarization operations add some latency to requests that exceed token thresholds

Documentation

Additional documentation can be found in the docs/ directory:

License

MIT