Engines
TTS engine implementations.
Base Engine
Base TTS class defining the common interface for all TTS backends.
- class scitex_audio._engines._base.BaseTTS(**kwargs)[source]
Bases:
ABCAbstract base class for TTS implementations.
- abstractmethod get_voices() List[dict][source]
Get available voices for this backend.
- Return type:
List of voice dictionaries with ‘name’ and ‘id’ keys.
- speak(text: str, output_path: str | None = None, play: bool = True, voice: str | None = None) dict[source]
Synthesize and optionally play text.
- abstractmethod synthesize(text: str, output_path: str) Path[source]
Synthesize text to audio file.
- Parameters:
text – Text to convert to speech.
output_path – Path to save the audio file.
- Return type:
Path to the generated audio file.
- to_bytes(text: str, voice: str | None = None) bytes[source]
Synthesize text and return raw audio bytes (MP3).
Does not play audio — caller is responsible for playback. Useful for streaming audio to a browser or returning via HTTP.
- Parameters:
text – Text to convert to speech.
voice – Optional voice name/id.
- Return type:
MP3 audio bytes.
Google TTS
Google Text-to-Speech (gTTS) backend - Free, requires internet.
- class scitex_audio._engines._gtts_engine.GoogleTTS(lang: str = 'en', slow: bool = False, speed: float = 1.5, gtts_factory=None, **kwargs)[source]
Bases:
BaseTTSGoogle Text-to-Speech backend using gTTS.
Free to use, requires internet connection. Good quality voices with multi-language support. Supports speed control via pydub (requires ffmpeg).
Install: pip install gTTS pydub
- LANGUAGES = {'ar': 'Arabic', 'de': 'German', 'en': 'English', 'es': 'Spanish', 'fr': 'French', 'hi': 'Hindi', 'it': 'Italian', 'ja': 'Japanese', 'ko': 'Korean', 'nl': 'Dutch', 'pl': 'Polish', 'pt': 'Portuguese', 'ru': 'Russian', 'sv': 'Swedish', 'tr': 'Turkish', 'vi': 'Vietnamese', 'zh-CN': 'Chinese (Simplified)', 'zh-TW': 'Chinese (Traditional)'}
ElevenLabs
ElevenLabs TTS backend - High quality, requires API key and payment.
- class scitex_audio._engines._elevenlabs_engine.ElevenLabsTTS(api_key: str | None = None, voice: str = 'adam', model_id: str = 'eleven_multilingual_v2', stability: float = 0.5, similarity_boost: float = 0.75, speed: float = 1.0, client=None, **kwargs)[source]
Bases:
BaseTTSElevenLabs TTS backend.
High-quality voices but requires API key and has usage costs.
- Environment:
ELEVENLABS_API_KEY: Your ElevenLabs API key
- MAX_SPEED = 1.2
- MIN_SPEED = 0.7
- VOICES = {'adam': 'pNInz6obpgDQGcFmaJgB', 'alice': 'Xb7hH8MSUJpSbSDYk0k2', 'antoni': 'ErXwobaYiN019PkySvjV', 'bella': 'hpp4J3VqNfWAUOO0d1Us', 'brian': 'nPczCjzI2devNBz1zQrb', 'callum': 'N2lVS1w4EtoT3dr4eOWO', 'charlie': 'IKne3meq5aSn9XLyUdCD', 'chris': 'iP95p4xoKVk53GoZ742B', 'daniel': 'onwK4e9ZLuTAKqWW03F9', 'domi': 'AZnzlk1XvdvUeBnXmlld', 'elli': 'MF3mGyEYCl7XYWbV9V6O', 'eric': 'cjVigY5qzO86Huf0OWal', 'george': 'JBFqnCBsd6RMkjVDRZzb', 'harry': 'SOYHLrjzK2X1ezoPC6cr', 'jessica': 'cgSgspJ2msm6clMCkdW9', 'josh': 'TxGEqnHWrfWFTfGW9XjX', 'laura': 'FGY2WhTYpPnrIDTdsKH5', 'liam': 'TX3LPaxmHKxFdv7VOQHJ', 'lily': 'pFZP5JQG7iQjIQuC4Bku', 'matilda': 'XrExE9yKIg1WjnnlVkGX', 'rachel': '21m00Tcm4TlvDq8ikWAM', 'river': 'SAz9YHcvj6GT2YYXdXww', 'roger': 'CwhRBWXzGAHq8TQ4Fs17', 'sam': 'yoZ06aMxZJJ28mfd3POQ', 'sarah': 'EXAVITQu4vr4xnSDxMaL', 'will': 'bIHbv24MWmeRgasZH58o'}
- property client
Lazy-load ElevenLabs client.
LuxTTS
LuxTTS backend - Open-source, offline, voice-cloning TTS.
Uses the ZipVoice/LuxTTS model from HuggingFace. Supports CPU, CUDA, and MPS devices. 48kHz output, near-realtime on CPU, 150x+ on GPU.
- Install:
pip install git+https://github.com/ysharma3501/LuxTTS.git
- class scitex_audio._engines._luxtts_engine.LuxTTS(device: str | None = None, model_id: str = 'YatharthS/LuxTTS', reference_audio: str | None = None, num_steps: int = 4, speed: float = 2.0, rms: float = 0.01, t_shift: float = 0.9, return_smooth: bool = False, ref_duration: float = 5.0, trim_start: float | None = None, **kwargs)[source]
Bases:
BaseTTSLuxTTS backend - open-source voice-cloning TTS.
High-quality 48kHz output. Near-realtime on CPU, 150x+ on GPU. Requires a reference audio file for voice cloning.
Install: pip install git+https://github.com/ysharma3501/LuxTTS.git
System TTS (pyttsx3)
System TTS backend using pyttsx3 - Offline, uses system voices.
- Requirements:
pip install pyttsx3
Linux: sudo apt install espeak-ng libespeak1
Windows: Uses SAPI5 (built-in)
macOS: Uses NSSpeechSynthesizer (built-in)
- class scitex_audio._engines._pyttsx3_engine.SystemTTS(rate: int = 150, volume: float = 1.0, voice: str | None = None, engine=None, **kwargs)[source]
Bases:
BaseTTSSystem TTS backend using pyttsx3.
Works offline using system’s built-in TTS engine. Quality varies by platform and available voices.
- Platforms:
Linux: espeak/espeak-ng
Windows: SAPI5
macOS: NSSpeechSynthesizer
- property engine
Lazy-load pyttsx3 engine.