ElevenLabs Scribe MCP Server
by aromanstatue
An MCP server implementation for ElevenLabs' Scribe speech-to-text API, providing real-time transcription capabilities with advanced context management and bidirectional streaming. It supports both real-time and file-based transcription.
View on GitHub
Last updated: N/A
ElevenLabs Scribe MCP Server
A Model Control Protocol (MCP) server implementation for ElevenLabs' Scribe speech-to-text API, providing real-time transcription capabilities with advanced context management and bidirectional streaming.
Features
- Real-time Transcription: Stream audio directly from your microphone and get instant transcriptions
- File-based Transcription: Upload audio files for batch processing
- MCP Protocol Support: Full implementation of the Model Control Protocol for better context management
- WebSocket Support: Real-time bidirectional communication
- Context Management: Maintain conversation context for improved transcription accuracy
- Multiple Audio Formats: Support for various audio formats with automatic conversion
- Language Detection: Automatic language detection and confidence scoring
- Event Detection: Identify speech and non-speech audio events
Installation
- Clone the repository:
git clone https://github.com/aromanstatue/MCP-Elevenlab-Scribe-ASR.git
cd MCP-Elevenlab-Scribe-ASR
- Create and activate a virtual environment:
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
- Install dependencies:
pip install -e .
- Create a
.env
file with your ElevenLabs API key:
ELEVENLABS_API_KEY=your-api-key-here
Usage
Starting the Server
python -m elevenlabs_scribe_mcp_server.main
The server will start on port 8000 by default (or the next available port).
Using the Example Client
- File Transcription:
python examples/client_example.py --file path/to/audio.wav
- Microphone Transcription:
python examples/client_example.py --mic
API Endpoints
- REST API:
POST /transcribe
: Upload an audio file for transcriptionGET /health
: Health check endpoint
- WebSocket API:
ws://localhost:8000/ws/transcribe
: Real-time audio transcription
MCP Protocol
The server implements the Model Control Protocol (MCP) with the following message types:
INIT
: Initialize a new transcription sessionSTART
: Begin audio streamingAUDIO
: Send audio dataTRANSCRIPTION
: Receive transcription resultsERROR
: Error messagesSTOP
: End audio streamingDONE
: Complete session
Development
Running Tests
pytest tests/
Project Structure
elevenlabs-scribe-mcp-server/
├── elevenlabs_scribe_mcp_server/
│ ├── __init__.py
│ ├── main.py # FastAPI server
│ └── mcp/
│ ├── __init__.py
│ ├── protocol.py # MCP protocol handler
│ ├── types.py # Protocol types
│ └── elevenlabs.py # ElevenLabs implementation
├── examples/
│ └── client_example.py # Example client
├── tests/
│ └── test_transcribe.py # Test suite
├── pyproject.toml # Project metadata
└── README.md
Requirements
- Python 3.8+
- FastAPI
- Uvicorn
- PyAudio (for microphone support)
- aiohttp
- python-dotenv
- pydantic
Contributing
- Fork the repository
- Create your feature branch (
git checkout -b feature/amazing-feature
) - Commit your changes (
git commit -m 'Add amazing feature'
) - Push to the branch (
git push origin feature/amazing-feature
) - Open a Pull Request
License
MIT License - see LICENSE file for details.
Acknowledgments
- ElevenLabs for their excellent Scribe API
- FastAPI for the modern web framework
- The Python community for the amazing tools and libraries