Whisper Speech Recognition MCP Server
by BigUncle
A high-performance speech recognition MCP server based on Faster Whisper, providing efficient audio transcription capabilities. It offers batch processing, CUDA acceleration, and multiple output formats.
Last updated: N/A
What is Whisper Speech Recognition MCP Server?
This is a speech recognition server built on Faster Whisper, designed to provide efficient and accurate audio transcription services. It leverages the MCP (Modular Communication Protocol) to integrate with other applications like Claude Desktop.
How to use Whisper Speech Recognition MCP Server?
- Install dependencies using
pip install -r requirements.txt
. 2. Start the server usingpython whisper_server.py
(orstart_server.bat
on Windows). 3. Configure Claude Desktop (or other MCP-compatible applications) to use the server by specifying the command and arguments in the configuration file. 4. Use available tools likeget_model_info
,transcribe
, andbatch_transcribe
to interact with the server.
Key features of Whisper Speech Recognition MCP Server
Integrated with Faster Whisper for efficient speech recognition
Batch processing acceleration for improved transcription speed
Automatic CUDA acceleration (if available)
Support for multiple model sizes (tiny to large-v3)
Output formats include VTT subtitles, SRT, and JSON
Support for batch transcription of audio files in a folder
Model instance caching to avoid repeated loading
Dynamic batch size adjustment based on GPU memory
Use cases of Whisper Speech Recognition MCP Server
Real-time transcription for meetings and conferences
Transcription of audio files for content creation
Integration with AI assistants and chatbots
Automated subtitle generation for videos
FAQ from Whisper Speech Recognition MCP Server
How do I enable CUDA acceleration?
How do I enable CUDA acceleration?
Ensure you have a CUDA-enabled GPU and install the appropriate PyTorch version with CUDA support.
What output formats are supported?
What output formats are supported?
The server supports VTT, SRT, and JSON output formats.
How can I improve transcription accuracy?
How can I improve transcription accuracy?
Use VAD filtering, specify the correct language, and ensure a clean audio input.
How do I configure the server with Claude Desktop?
How do I configure the server with Claude Desktop?
Modify the claude_desktop_config.json
file to include the server's command and arguments.
What is the purpose of model caching?
What is the purpose of model caching?
Model caching avoids repeated loading of the Whisper model, improving performance and reducing startup time.