ElevenLabs Scribe MCP Server

A Model Control Protocol (MCP) server implementation for ElevenLabs' Scribe speech-to-text API, providing real-time transcription capabilities with advanced context management and bidirectional streaming.

Features

Real-time Transcription: Stream audio directly from your microphone and get instant transcriptions
File-based Transcription: Upload audio files for batch processing
MCP Protocol Support: Full implementation of the Model Control Protocol for better context management
WebSocket Support: Real-time bidirectional communication
Context Management: Maintain conversation context for improved transcription accuracy
Multiple Audio Formats: Support for various audio formats with automatic conversion
Language Detection: Automatic language detection and confidence scoring
Event Detection: Identify speech and non-speech audio events

Installation

Clone the repository:

git clone https://github.com/aromanstatue/MCP-Elevenlab-Scribe-ASR.git
cd MCP-Elevenlab-Scribe-ASR

Create and activate a virtual environment:

python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

Install dependencies:

pip install -e .

Create a .env file with your ElevenLabs API key:

ELEVENLABS_API_KEY=your-api-key-here

Usage

Starting the Server

python -m elevenlabs_scribe_mcp_server.main

The server will start on port 8000 by default (or the next available port).

Using the Example Client

File Transcription:

python examples/client_example.py --file path/to/audio.wav

Microphone Transcription:

python examples/client_example.py --mic

API Endpoints

REST API:

POST /transcribe: Upload an audio file for transcription
GET /health: Health check endpoint

WebSocket API:

ws://localhost:8000/ws/transcribe: Real-time audio transcription

MCP Protocol

The server implements the Model Control Protocol (MCP) with the following message types:

INIT: Initialize a new transcription session
START: Begin audio streaming
AUDIO: Send audio data
TRANSCRIPTION: Receive transcription results
ERROR: Error messages
STOP: End audio streaming
DONE: Complete session

Development

Running Tests

pytest tests/

Project Structure

elevenlabs-scribe-mcp-server/
├── elevenlabs_scribe_mcp_server/
│   ├── __init__.py
│   ├── main.py              # FastAPI server
│   └── mcp/
│       ├── __init__.py
│       ├── protocol.py      # MCP protocol handler
│       ├── types.py         # Protocol types
│       └── elevenlabs.py    # ElevenLabs implementation
├── examples/
│   └── client_example.py    # Example client
├── tests/
│   └── test_transcribe.py   # Test suite
├── pyproject.toml           # Project metadata
└── README.md

Requirements

Python 3.8+
FastAPI
Uvicorn
PyAudio (for microphone support)
aiohttp
python-dotenv
pydantic

Contributing

Fork the repository
Create your feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

License

MIT License - see LICENSE file for details.

Acknowledgments

ElevenLabs for their excellent Scribe API
FastAPI for the modern web framework
The Python community for the amazing tools and libraries