Advanced Configuration#

Scholium works out of the box with no configuration file — all settings have sensible defaults. When you need to tune voice speed, switch providers, or control timing, config.yaml is where you do it.

What is config.yaml?#

config.yaml is an optional YAML file that Scholium looks for in the current working directory when any command runs. Values in the file override the built-in defaults; settings you omit fall back to defaults automatically.

A config file is local to each project, so different lectures can use different providers, voices, and timing settings simply by keeping a config.yaml alongside the markdown source.

Creating a config file#

scholium config init

This writes a fully-annotated config.yaml to the current directory. Every supported setting is included with its default value and an explanatory comment. Edit only the lines you need to change.

Options:

Option	Description
`--path PATH`	Write to a different location (default: `config.yaml`)
`--force`	Overwrite an existing file

Viewing the current configuration#

scholium config show

Prints the effective configuration — built-in defaults merged with your config.yaml and any environment variables. API keys are masked as *** so the output is safe to share or log.

Use --path PATH to inspect a config file that is not in the current directory.

High-level CLI overrides#

--speed and --quality on scholium generate let you adjust voice settings without editing config.yaml. They take precedence over any provider-specific values in the config file.

# 10% slower speech, highest quality model for the active provider
scholium generate lecture.md output.mp4 --speed 0.9 --quality best

--quality PRESET maps to provider-specific settings:

Provider	`fast`	`balanced`	`best`
piper	`quality: low`	`quality: medium`	`quality: high`
openai	`tts-1`	`tts-1`	`tts-1-hd`
elevenlabs	eleven_turbo_v2_5	eleven_multilingual_v2	eleven_multilingual_v2
bark	`small`	`small`	`large`
tortoise	`ultra_fast`	`fast`	`high_quality`
styletts2	3 diffusion steps	5 steps	10 steps
f5tts	`vocos`	`vocos`	`bigvgan`

--speed RATE for Piper and OpenAI is passed to the provider natively. For all other providers, Scholium applies a pitch-preserving time-stretch via ffmpeg’s atempo filter after generation — no extra dependencies needed.

Settings Reference#

Slide backend#

slide_backend: "pandoc"   # pandoc | slidev | marp

Per-lecture override: add slide-backend: marp (Pandoc-style hyphen, matching slide-level:) to a lecture file’s own YAML frontmatter, and that lecture will render with the named backend regardless of config.yaml. CLI flag --slide-backend overrides everything.

Each backend has its own settings section below. All three accept a frontmatter: overlay merged into every generated deck — useful for title-slide metadata, language tags, and backend-specific tweaks. Three keys are portable (same name, same values, all three backends): title, author, lang (IETF tag). Everything else is backend-specific — Pandoc’s aspectratio/header-includes, Slidev’s colorSchema/canvasWidth, Marp’s paginate/header/footer — and is documented per-backend below.

Pandoc#

pandoc:
  # template: "beamer"     # Pandoc output format (default: beamer)
  # dpi: 300               # PNG rasterisation DPI (default: 300)
  frontmatter:             # merged via `--metadata-file`, overrides source .md
    aspectratio: 169       # Beamer 16:9 deck
    theme: "metropolis"    # Beamer theme
    lang: "en-AU"
    # header-includes: |
    #   \usepackage{siunitx}

The legacy top-level pandoc_template: beamer is still honoured for backward compatibility; Config._migrate_legacy() lifts it into pandoc.template automatically.

Slidev#

slidev:
  theme: "default"                   # "default" → built-in (no theme package needed)
  command: ["npx", "@slidev/cli"]    # how to invoke the Slidev CLI
  timeout: 600                       # seconds for PNG export
  with_clicks: false                 # export each click step as a separate PNG
  # extra_args: ["--dark"]           # forwarded verbatim to `slidev export`
  frontmatter:
    colorSchema: "dark"              # auto | light | dark
    htmlAttrs: { lang: "en-AU" }

Marp#

marp:
  theme: "default"                          # default | gaia | uncover
  command: ["npx", "@marp-team/marp-cli"]
  paginate: false                           # show slide numbers
  no_sandbox: true                          # add Chrome --no-sandbox automatically
  # browser: "chrome"                       # chrome | edge | firefox | auto
  # browser_path: "/path/to/chrome"         # explicit Chromium binary
  # extra_args: ["--allow-local-files"]
  frontmatter:
    lang: "en-AU"
    header: "Lecture 3"
    footer: "Physics 101"

Run scholium slides list to confirm each backend’s dependencies are in place; scholium slides check <backend> renders a 2-slide canned deck end-to-end.

TTS provider#

tts_provider: "piper"   # Which engine to use
voice: "en_US-lessac-medium"  # Default voice (meaning varies by provider)

voice interpretation by provider:

Provider	Value
piper	Voice model name, e.g. `en_US-lessac-medium`
elevenlabs	Voice ID (run `scholium list-voices --provider elevenlabs`)
openai	Built-in name: `alloy`, `echo`, `fable`, `onyx`, `nova`, `shimmer`
coqui / f5tts / styletts2 / tortoise	Registered voice name from `scholium list-voices`

Piper#

piper:
  quality: "medium"   # low | medium | high
  speed: 1.0          # 0.1–5.0  (1.0 = normal, 0.8 = 20% slower)

speed controls the --length-scale flag passed to the piper binary. Because Piper’s length-scale parameter is the inverse of speed (higher value = slower speech), Scholium handles the conversion automatically — just set speed as you would expect: values below 1.0 slow the voice down, values above 1.0 speed it up.

ElevenLabs#

elevenlabs:
  model: "eleven_multilingual_v2"
  stability: 0.5         # 0.0–1.0  (higher = more consistent, lower = more expressive)
  similarity_boost: 0.75 # 0.0–1.0  (how closely to match the reference voice)

Omitting stability or similarity_boost leaves them at ElevenLabs’ own defaults. For a conversational tone try stability: 0.3; for a steady narration voice try stability: 0.7.

API key: never store your key in config.yaml. Use the ELEVENLABS_API_KEY environment variable instead — see Managing API Keys.

OpenAI TTS#

openai:
  model: "tts-1"   # tts-1 | tts-1-hd
  speed: 1.0       # 0.25–4.0  (1.0 = normal)

tts-1-hd produces noticeably higher quality at roughly twice the cost per character.

API key: use the OPENAI_API_KEY environment variable.

Coqui#

coqui:
  model: "tts_models/multilingual/multi-dataset/xtts_v2"

Bark#

bark:
  model: "small"   # small | large

large produces higher quality but requires significantly more VRAM and time.

F5-TTS#

f5tts:
  model: "F5-TTS"    # F5-TTS | E2-TTS
  vocoder: "vocos"   # vocos | bigvgan
  model_path: "my_voice/sample.wav"   # relative to voices_dir, or absolute
  ref_text: "The text spoken in the reference clip."

model_path and ref_text are optional if you have already registered a voice with scholium train-voice.

StyleTTS2#

styletts2:
  alpha: 0.3           # 0.0–1.0  style blend
  beta: 0.7            # 0.0–1.0  diffusion guidance strength
  diffusion_steps: 5   # 1–20  more steps = slower but higher quality
  model_path: "my_voice/sample.wav"

Tortoise#

tortoise:
  preset: "fast"   # ultra_fast | fast | standard | high_quality
  kv_cache: true
  half: true       # float16 — faster on GPU, slight quality reduction
  model_path: "my_voice/sample.wav"

Video#

resolution and fps are shared between slide rasterisation (the slide backends) and the final mp4 encode (ffmpeg) — one source of truth.

resolution: [1920, 1080]
fps: 30

The rest of the video pipeline lives under the video: section. Run scholium video list to see which codecs and hardware-acceleration methods are compiled into your local ffmpeg, and scholium video check to encode a 2-second test clip with the configured settings.

video:
  codec: "libx264"          # libx264 | libx265 | libvpx-vp9 | libaom-av1 | h264_nvenc | …
  preset: "medium"          # ultrafast … veryslow (x264/x265 only; ignored otherwise)
  crf: 23                   # 0=lossless, 18=visually-lossless, 23=default, 28=tighter
  pixel_format: "yuv420p"   # yuv420p = broadest player compatibility
  audio_codec: "aac"        # aac | libopus | libmp3lame | flac | …
  audio_bitrate: "192k"
  extra_args: []            # forwarded verbatim to every ffmpeg call

Hardware encoding: switch codec to h264_nvenc (or hevc_nvenc) on an NVIDIA GPU for a 5–10× speed-up over libx264. video list reports whether your ffmpeg build supports the encoder before you risk a long render.

Power-user escape hatch: anything ffmpeg accepts but Scholium doesn’t expose as a named knob goes in extra_args — for example extra_args: ["-movflags", "+faststart"] for web-streamed mp4s.

Timing#

timing:
  default_pre_delay: 1.0    # silence before narration (seconds)
  default_post_delay: 2.0   # silence after narration (seconds)
  min_slide_duration: 4.0   # minimum slide duration (seconds)
  silent_slide_duration: 3.0  # duration for slides without narration (e.g. TOC)

These are global defaults. Per-slide overrides use [PRE Ns] / [POST Ns] / [DUR Ns] directives in the notes block — see Timing Control.

Paths#

voices_dir: "~/.local/share/scholium/voices"
temp_dir: "./temp"
output_dir: "./output"
keep_temp_files: false

Set keep_temp_files: true to retain intermediate audio and image files for debugging.

Tips#

Per-project config — keep a config.yaml in the same directory as your lecture markdown. Run all scholium commands from that directory and the file is picked up automatically.

Don’t commit API keys — add config.yaml to .gitignore if it contains an API key, or better yet use environment variables for keys and commit the rest of the file freely.

Check your effective settings — after editing, run scholium config show to confirm the merged result looks correct before generating a long lecture.