Whisper Speech Recognition MCP Server logo

Whisper Speech Recognition MCP Server

by BigUncle

A high-performance speech recognition MCP server based on Faster Whisper, providing efficient audio transcription capabilities. It offers batch processing, CUDA acceleration, and multiple output formats.

View on GitHub

Last updated: N/A

What is Whisper Speech Recognition MCP Server?

This is a speech recognition server built on Faster Whisper, designed to provide efficient and accurate audio transcription services. It leverages the MCP (Modular Communication Protocol) to integrate with other applications like Claude Desktop.

How to use Whisper Speech Recognition MCP Server?

  1. Install dependencies using pip install -r requirements.txt. 2. Start the server using python whisper_server.py (or start_server.bat on Windows). 3. Configure Claude Desktop (or other MCP-compatible applications) to use the server by specifying the command and arguments in the configuration file. 4. Use available tools like get_model_info, transcribe, and batch_transcribe to interact with the server.

Key features of Whisper Speech Recognition MCP Server

  • Integrated with Faster Whisper for efficient speech recognition

  • Batch processing acceleration for improved transcription speed

  • Automatic CUDA acceleration (if available)

  • Support for multiple model sizes (tiny to large-v3)

  • Output formats include VTT subtitles, SRT, and JSON

  • Support for batch transcription of audio files in a folder

  • Model instance caching to avoid repeated loading

  • Dynamic batch size adjustment based on GPU memory

Use cases of Whisper Speech Recognition MCP Server

  • Real-time transcription for meetings and conferences

  • Transcription of audio files for content creation

  • Integration with AI assistants and chatbots

  • Automated subtitle generation for videos

FAQ from Whisper Speech Recognition MCP Server

How do I enable CUDA acceleration?

Ensure you have a CUDA-enabled GPU and install the appropriate PyTorch version with CUDA support.

What output formats are supported?

The server supports VTT, SRT, and JSON output formats.

How can I improve transcription accuracy?

Use VAD filtering, specify the correct language, and ensure a clean audio input.

How do I configure the server with Claude Desktop?

Modify the claude_desktop_config.json file to include the server's command and arguments.

What is the purpose of model caching?

Model caching avoids repeated loading of the Whisper model, improving performance and reducing startup time.