Speech.sh
by j3k0
Speech.sh is a command-line utility for converting text to speech using OpenAI's API. It offers multiple voice options, adjustable speech speed, and flexible API key management.
Last updated: N/A
Speech.sh
A powerful command-line utility for text-to-speech conversion using OpenAI's API.
Features
- Convert text to speech with a simple command
- Multiple voice options (onyx, alloy, echo, fable, nova, shimmer)
- Adjustable speech speed (0.25 to 4.0)
- Support for both tts-1 and tts-1-hd models
- Flexible API key management (command-line, environment variable, or file)
- Automatic caching to avoid duplicate API calls
- Robust retry mechanism for handling network issues
- Support for both ffmpeg and mplayer for audio playback
- MCP (Model Context Protocol) compatibility for integration with AI assistants
Installation
-
Clone this repository:
git clone https://github.com/j3k0/speech.sh.git cd speech.sh
-
Make the scripts executable:
chmod +x speech.sh mcp.sh launch
-
Ensure you have the required dependencies:
- curl
- jq
- Either ffmpeg or mplayer (ffmpeg preferred)
Usage
Basic usage:
./speech.sh --text "Hello, world!"
With more options:
./speech.sh --text "Hello, world!" --voice nova --speed 1.2 --model tts-1-hd
Options
-h, --help Show help message and exit
-t, --text TEXT Text to convert to speech (required)
-v, --voice VOICE Voice model to use (default: onyx)
-s, --speed SPEED Speech speed (default: 1.0)
-o, --output FILE Output file path (default: auto-generated)
-a, --api_key KEY OpenAI API key
-m, --model MODEL TTS model to use (default: tts-1)
-p, --player PLAYER Audio player to use: auto, ffmpeg, or mplayer (default: auto)
--verbose Enable verbose logging
-V, --verbose Same as --verbose
-r, --retries N Number of retry attempts for API calls (default: 3)
-T, --timeout N Timeout in seconds for API calls (default: 30)
API Key Configuration
The script accepts an OpenAI API key in three ways (in order of precedence):
- Command-line argument:
--api_key "your-api-key"
- Environment variable:
export OPENAI_API_KEY="your-api-key"
- A file named
API_KEY
in the script's directory
Advanced Features
Auto-caching
The script caches audio files by default to avoid unnecessary API calls. If you request the same text with the same voice and speed, it will reuse the previously generated audio file.
Retry Logic
The script includes sophisticated retry logic for API calls:
- Automatically retries failed API calls (default: 3 attempts)
- Implements exponential backoff for reliability
- Uses native curl retry mechanism when available
- Configurable timeout and retry values
Audio Player Options
You can choose your preferred audio player:
--player auto
: Use ffmpeg if available, fall back to mplayer (default)--player ffmpeg
: Force using ffmpeg--player mplayer
: Force using mplayer
MCP Integration
The mcp.sh
script provides Model Context Protocol compatibility, allowing the
text-to-speech functionality to be used by MCP-compatible AI assistants like Claude.
To use the MCP server:
# Start the MCP server using the launch script
./launch
For detailed instructions on using the MCP integration, see MCP_README.md.
Security Considerations
The script takes several steps to ensure security:
- Uses proper JSON handling with
jq
for parameter processing - Implements proper array-based parameter passing to prevent shell injection
- Validates needed dependencies before execution
- Uses error handling throughout the execution process
Examples
Convert text to speech with default settings:
./speech.sh --text "Hello, world!"
Use a different voice:
./speech.sh --text "Hello, world!" --voice nova
Adjust the speech speed:
./speech.sh --text "Hello, world!" --speed 1.5
Save to a specific file:
./speech.sh --text "Hello, world!" --output hello.mp3
Use environment variable for API key:
export OPENAI_API_KEY="your-api-key"
./speech.sh --text "Hello, world!"
Troubleshooting
If you encounter issues:
- Enable verbose logging with the
--verbose
flag - Check that your OpenAI API key is valid
- Verify that all dependencies are installed
- Ensure you have internet connectivity
- Check the permissions of the output directory
Contributors
- Jean-Christophe Hoelt
- Claude AI (Anthropic)
License
GPL