MCP Server Whisper logo

MCP Server Whisper

by arcaputo3

MCP Server Whisper provides a standardized way to process audio files through OpenAI's latest transcription and speech services. It enables AI assistants like Claude to seamlessly interact with audio processing capabilities by implementing the Model Context Protocol.

View on GitHub

Last updated: N/A

What is MCP Server Whisper?

MCP Server Whisper is a Model Context Protocol (MCP) server designed for advanced audio transcription and processing using OpenAI's Whisper and GPT-4o models. It provides a standardized interface for AI assistants to interact with audio processing capabilities.

How to use MCP Server Whisper?

To use MCP Server Whisper, first clone the repository and install the dependencies using uv sync. Then, set up the environment variables, including the OpenAI API key and the path to your audio files. Start the server using mcp dev src/mcp_server_whisper/server.py for development or mcp install src/mcp_server_whisper/server.py [--env-file .env] for deployment with MCP clients like Claude Desktop. You can then use the exposed MCP tools to manage and process audio files.

Key features of MCP Server Whisper

  • Advanced file searching with regex patterns and metadata filtering

  • Parallel batch processing for multiple audio files

  • Format conversion between supported audio types

  • Automatic compression for oversized files

  • Multi-model transcription with support for all OpenAI audio models

  • Interactive audio chat with GPT-4o audio models

  • Enhanced transcription with specialized prompts and timestamp support

  • Text-to-speech generation with customizable voices, instructions, and speed

  • Comprehensive metadata support

  • High-performance caching

Use cases of MCP Server Whisper

  • Transcribing audio files for analysis and documentation

  • Enabling AI assistants to understand and respond to audio content

  • Converting audio files to different formats

  • Generating text-to-speech audio for various applications

  • Searching and filtering audio files based on metadata

  • Batch processing large numbers of audio files efficiently

FAQ from MCP Server Whisper

What audio formats are supported?

The server supports flac, mp3, mp4, mpeg, mpga, m4a, ogg, wav, and webm for transcription and mp3 and wav for chat.

How do I configure the server with Claude Desktop?

Add the provided JSON configuration to your claude_desktop_config.json file, ensuring that you replace your_openai_api_key and /path/to/your/audio/files with your actual values.

What OpenAI models are supported for transcription?

The server supports whisper-1, gpt-4o-transcribe, and gpt-4o-mini-transcribe.

How does the server handle large audio files?

Files larger than 25MB are automatically compressed to meet API limits.

What is the Model Context Protocol (MCP)?

The Model Context Protocol is a standard that defines how AI models interact with external tools and data sources, enabling seamless integration and communication.

MCP Server Whisper - MCP Server | MCP Directory