Engines

TTS engine implementations.

Base Engine

Base TTS class defining the common interface for all TTS backends.

class scitex_audio._engines._base.BaseTTS(**kwargs)[source]

Bases: ABC

Abstract base class for TTS implementations.

abstractmethod get_voices() → List[dict][source]

Get available voices for this backend.

Return type:: List of voice dictionaries with ‘name’ and ‘id’ keys.

abstract property name: str: Return the backend name.

property requires_api_key: bool: Whether this backend requires an API key.

property requires_internet: bool: Whether this backend requires internet connection.

speak(text: str, output_path: str | None = None, play: bool = True, voice: str | None = None) → dict[source]

Synthesize and optionally play text.

Parameters:

text – Text to speak.
output_path – Optional path to save audio.
play – Whether to play the audio.
voice – Optional voice name/id.

Returns:

Dict with keys

Return type:

path (if output_path), played (bool), success (bool).

abstractmethod synthesize(text: str, output_path: str) → Path[source]

Synthesize text to audio file.

Parameters:

text – Text to convert to speech.
output_path – Path to save the audio file.

Return type:

Path to the generated audio file.

to_bytes(text: str, voice: str | None = None) → bytes[source]

Synthesize text and return raw audio bytes (MP3).

Does not play audio — caller is responsible for playback. Useful for streaming audio to a browser or returning via HTTP.

Parameters:

text – Text to convert to speech.
voice – Optional voice name/id.

Return type:

MP3 audio bytes.

class scitex_audio._engines._base.TTSBackend[source]

Bases: object

Enum-like class for TTS backend types.

EDGE = 'edge'

ELEVENLABS = 'elevenlabs'

GTTS = 'gtts'

LUXTTS = 'luxtts'

PYTTSX3 = 'pyttsx3'

classmethod available() → List[str][source]: Return list of available backends.

Google TTS

Google Text-to-Speech (gTTS) backend - Free, requires internet.

class scitex_audio._engines._gtts_engine.GoogleTTS(lang: str = 'en', slow: bool = False, speed: float = 1.5, gtts_factory=None, **kwargs)[source]

Bases: BaseTTS

Google Text-to-Speech backend using gTTS.

Free to use, requires internet connection. Good quality voices with multi-language support. Supports speed control via pydub (requires ffmpeg).

Install: pip install gTTS pydub

LANGUAGES = {'ar': 'Arabic', 'de': 'German', 'en': 'English', 'es': 'Spanish', 'fr': 'French', 'hi': 'Hindi', 'it': 'Italian', 'ja': 'Japanese', 'ko': 'Korean', 'nl': 'Dutch', 'pl': 'Polish', 'pt': 'Portuguese', 'ru': 'Russian', 'sv': 'Swedish', 'tr': 'Turkish', 'vi': 'Vietnamese', 'zh-CN': 'Chinese (Simplified)', 'zh-TW': 'Chinese (Traditional)'}

get_voices() → List[dict][source]: Get available languages as ‘voices’.

property name: str: Return the backend name.

property requires_internet: bool: Whether this backend requires internet connection.

synthesize(text: str, output_path: str) → Path[source]: Synthesize text using Google TTS with optional speed control.

ElevenLabs

ElevenLabs TTS backend - High quality, requires API key and payment.

class scitex_audio._engines._elevenlabs_engine.ElevenLabsTTS(api_key: str | None = None, voice: str = 'adam', model_id: str = 'eleven_multilingual_v2', stability: float = 0.5, similarity_boost: float = 0.75, speed: float = 1.0, client=None, **kwargs)[source]

Bases: BaseTTS

ElevenLabs TTS backend.

High-quality voices but requires API key and has usage costs.

Environment:: ELEVENLABS_API_KEY: Your ElevenLabs API key

MAX_SPEED = 1.2

MIN_SPEED = 0.7

VOICES = {'adam': 'pNInz6obpgDQGcFmaJgB', 'alice': 'Xb7hH8MSUJpSbSDYk0k2', 'antoni': 'ErXwobaYiN019PkySvjV', 'bella': 'hpp4J3VqNfWAUOO0d1Us', 'brian': 'nPczCjzI2devNBz1zQrb', 'callum': 'N2lVS1w4EtoT3dr4eOWO', 'charlie': 'IKne3meq5aSn9XLyUdCD', 'chris': 'iP95p4xoKVk53GoZ742B', 'daniel': 'onwK4e9ZLuTAKqWW03F9', 'domi': 'AZnzlk1XvdvUeBnXmlld', 'elli': 'MF3mGyEYCl7XYWbV9V6O', 'eric': 'cjVigY5qzO86Huf0OWal', 'george': 'JBFqnCBsd6RMkjVDRZzb', 'harry': 'SOYHLrjzK2X1ezoPC6cr', 'jessica': 'cgSgspJ2msm6clMCkdW9', 'josh': 'TxGEqnHWrfWFTfGW9XjX', 'laura': 'FGY2WhTYpPnrIDTdsKH5', 'liam': 'TX3LPaxmHKxFdv7VOQHJ', 'lily': 'pFZP5JQG7iQjIQuC4Bku', 'matilda': 'XrExE9yKIg1WjnnlVkGX', 'rachel': '21m00Tcm4TlvDq8ikWAM', 'river': 'SAz9YHcvj6GT2YYXdXww', 'roger': 'CwhRBWXzGAHq8TQ4Fs17', 'sam': 'yoZ06aMxZJJ28mfd3POQ', 'sarah': 'EXAVITQu4vr4xnSDxMaL', 'will': 'bIHbv24MWmeRgasZH58o'}

property client: Lazy-load ElevenLabs client.

get_voices() → List[dict][source]: Get available voices.

property name: str: Return the backend name.

property requires_api_key: bool: Whether this backend requires an API key.

property requires_internet: bool: Whether this backend requires internet connection.

synthesize(text: str, output_path: str) → Path[source]: Synthesize text using ElevenLabs API.

LuxTTS

LuxTTS backend - Open-source, offline, voice-cloning TTS.

Uses the ZipVoice/LuxTTS model from HuggingFace. Supports CPU, CUDA, and MPS devices. 48kHz output, near-realtime on CPU, 150x+ on GPU.

Install:: pip install git+https://github.com/ysharma3501/LuxTTS.git

class scitex_audio._engines._luxtts_engine.LuxTTS(device: str | None = None, model_id: str = 'YatharthS/LuxTTS', reference_audio: str | None = None, num_steps: int = 4, speed: float = 2.0, rms: float = 0.01, t_shift: float = 0.9, return_smooth: bool = False, ref_duration: float = 5.0, trim_start: float | None = None, **kwargs)[source]

Bases: BaseTTS

LuxTTS backend - open-source voice-cloning TTS.

High-quality 48kHz output. Near-realtime on CPU, 150x+ on GPU. Requires a reference audio file for voice cloning.

Install: pip install git+https://github.com/ysharma3501/LuxTTS.git

get_voices() → List[dict][source]: Get available voices (reference audio files).

property name: str: Return the backend name.

property requires_internet: bool: Whether this backend requires internet connection.

speak(text: str, output_path: str | None = None, play: bool = True, voice: str | None = None) → dict[source]: Synthesize and optionally play. Uses .wav temp files (not .mp3).

synthesize(text: str, output_path: str) → Path[source]: Synthesize text using LuxTTS.

System TTS (pyttsx3)

System TTS backend using pyttsx3 - Offline, uses system voices.

Requirements:

pip install pyttsx3
Linux: sudo apt install espeak-ng libespeak1
Windows: Uses SAPI5 (built-in)
macOS: Uses NSSpeechSynthesizer (built-in)

class scitex_audio._engines._pyttsx3_engine.SystemTTS(rate: int = 150, volume: float = 1.0, voice: str | None = None, engine=None, **kwargs)[source]

Bases: BaseTTS

System TTS backend using pyttsx3.

Works offline using system’s built-in TTS engine. Quality varies by platform and available voices.

Platforms:

Linux: espeak/espeak-ng
Windows: SAPI5
macOS: NSSpeechSynthesizer

property engine: Lazy-load pyttsx3 engine.

get_voices() → List[dict][source]: Get available system voices.

property name: str: Return the backend name.

speak_direct(text: str)[source]: Speak directly without saving to file (faster).

synthesize(text: str, output_path: str) → Path[source]: Synthesize text using system TTS.