MCP Server Whisper

by arcaputo3

Media & Content/Audio audio transcription OpenAI Whisper GPT-4o audio processing text-to-speech

MCP Server Whisper provides a standardized way to process audio files through OpenAI's latest transcription and speech services. It enables AI assistants like Claude to seamlessly interact with audio processing capabilities by implementing the Model Context Protocol.

View on GitHub

Last updated: N/A

What is MCP Server Whisper?

MCP Server Whisper is a Model Context Protocol (MCP) server designed for advanced audio transcription and processing using OpenAI's Whisper and GPT-4o models. It provides a standardized interface for AI assistants to interact with audio processing capabilities.

How to use MCP Server Whisper?

To use MCP Server Whisper, first clone the repository and install the dependencies using uv sync. Then, set up the environment variables, including the OpenAI API key and the path to your audio files. Start the server using mcp dev src/mcp_server_whisper/server.py for development or mcp install src/mcp_server_whisper/server.py [--env-file .env] for deployment with MCP clients like Claude Desktop. You can then use the exposed MCP tools to manage and process audio files.

Key features of MCP Server Whisper

Advanced file searching with regex patterns and metadata filtering
Parallel batch processing for multiple audio files
Format conversion between supported audio types
Automatic compression for oversized files
Multi-model transcription with support for all OpenAI audio models
Interactive audio chat with GPT-4o audio models
Enhanced transcription with specialized prompts and timestamp support
Text-to-speech generation with customizable voices, instructions, and speed
Comprehensive metadata support
High-performance caching

Use cases of MCP Server Whisper

Transcribing audio files for analysis and documentation
Enabling AI assistants to understand and respond to audio content
Converting audio files to different formats
Generating text-to-speech audio for various applications
Searching and filtering audio files based on metadata
Batch processing large numbers of audio files efficiently

FAQ from MCP Server Whisper

What audio formats are supported?

The server supports flac, mp3, mp4, mpeg, mpga, m4a, ogg, wav, and webm for transcription and mp3 and wav for chat.

How do I configure the server with Claude Desktop?

Add the provided JSON configuration to your claude_desktop_config.json file, ensuring that you replace your_openai_api_key and /path/to/your/audio/files with your actual values.

What OpenAI models are supported for transcription?

The server supports whisper-1, gpt-4o-transcribe, and gpt-4o-mini-transcribe.

How does the server handle large audio files?

Files larger than 25MB are automatically compressed to meet API limits.

What is the Model Context Protocol (MCP)?

The Model Context Protocol is a standard that defines how AI models interact with external tools and data sources, enabling seamless integration and communication.