scholium.tts_engine.TTSEngine#

class TTSEngine(provider_name, provider_config=None, voices_dir=None, config=None, quality_preset=None, speed_override=None)[source]#

Bases: object

Manages TTS provider and audio generation.

Initialize TTS engine.

Parameters:
  • provider_name (str) – Name of TTS provider (‘piper’, ‘elevenlabs’, ‘coqui’, ‘openai’, ‘bark’, ‘f5tts’, ‘styletts2’, ‘tortoise’)

  • provider_config (Dict[str, Any]) – Configuration for the provider

  • voices_dir (str) – Directory for storing voice models and trained voices

  • config – Config object for accessing global settings

  • quality_preset (Optional[str]) – High-level quality preset: ‘fast’, ‘balanced’, or ‘best’. Overrides the matching provider config key(s).

  • speed_override (Optional[float]) – Speech rate multiplier (0.1–5.0). For providers that accept native speed (piper, openai) this is wired through the provider config; for all others it is applied as a pitch-preserving ffmpeg atempo post-process step.

Methods

generate_audio

Generate audio from text.

generate_segments

Generate audio for multiple narration segments.

generate_audio(text, voice_config, output_path)[source]#

Generate audio from text.

Parameters:
  • text (str) – Text to convert to speech

  • voice_config (Dict[str, Any]) – Voice configuration

  • output_path (str) – Path to save audio file

Return type:

str

Returns:

Path to generated audio file

generate_segments(segments, voice_config, output_dir, progress_callback=None, resume=False)[source]#

Generate audio for multiple narration segments.

Parameters:
  • segments (List[Dict[str, Any]]) – List of segment dicts, each containing at least: text (str), slide_number (int), and optionally min_duration, pre_delay, post_delay, fixed_duration.

  • voice_config (Dict[str, Any]) – Voice configuration passed to the TTS provider.

  • output_dir (str) – Directory where individual audio files are saved.

  • progress_callback – Optional zero-argument callable invoked after each segment is processed (useful for progress bars).

  • resume (bool) – When True, skip TTS generation for segments whose audio file already exists on disk (useful for resuming an interrupted run).

Returns:

List of enriched segment dicts, each containing all original keys plus:

{
    "audio_path": "/path/to/audio_0000.mp3",
    "audio_duration": 5.2,
    "duration": 7.2,        # includes pre/post delays
    "fixed_duration": None,  # if specified
    "min_duration": 10.0,   # if specified
    "pre_delay": 1.0,
    "post_delay": 1.0,
}

Return type:

list[dict]