Emphasis detection
The app reads each song’s lyrics at analyze time and asks the local Qwen LLM to identify emphasized lines (hooks, drops, big lyrical beats). At render time, the per-style effects engine pulls back during emphasis and cranks above its style baseline outside emphasis. The visible result: emphasized lyrics land cleanly with subdued motion; connective bars feel kinetic and dramatic.
How it works
-
Analyze time — the song’s lyrics (with Whisper-aligned per-line timestamps) are sent to a single Qwen call. The LLM returns
{"emphasis_lines": [3, 4, 12, 13, 25]}. Consecutive lines are grouped into intervals; boundaries are snapped to nearby bass-onset peaks/valleys (within ±0.5s) so emphasis windows align to musical phrasing instead of mid-syllable. -
Cached — result lives at
<song-dir>/emphasis.json. Subsequent renders reuse it. The cache is invalidated when you save new lyrics for the song. -
Render time — the effects composer reads
emphasis.jsonand the active style’s swing config (per-stylecalm_mult/crazy_mult), then weaves a time-varying multiplier into the ffmpeg filter graph. Four operators react: brightness flash, pulse, chorus pulse, zoom bump. Ken Burns and overlay operators (grain / leaks / dust) stay at their style baseline due to ffmpeg filter limitations — they form the “ambient hum” that’s relatively constant.
Per-style swing
Each style declares its own swing range. Cinematic gets gentle drama; Kinetic gets max push-pull.
| Style | calm × baseline | crazy × baseline |
|---|---|---|
| Cinematic | 0.6 | 1.2 |
| Kinetic | 0.2 | 2.5 |
| Illustrated | 0.7 | 1.0 (subtle) |
| Visualizer | 0.3 | 2.0 |
The style profile defines the average vibe; the envelope swings around it.
Asymmetric ramp
Effects wind down for ~1 second BEFORE an emphasized line lands (anticipation), then hard-cut back to crazy after the emphasis ends. Music-video editor’s classic “calm before the storm” pattern.
Failure modes
- LLM error or malformed response → no emphasis intervals; render uses the static style profile (current pre-emphasis behavior). Logged as
[emphasis] failed. - Bass-onset data missing → boundary snap is a no-op; intervals use raw lyric timestamps.
Known limitations
- Ken Burns motion is currently static — ffmpeg’s
zoompanfilter parser splits on commas in itsz=parameter before reaching the eval engine, so the multi-comma envelope expressions (if(between(t,a,b),X,Y)) cause Eval errors. The other 4 envelope-aware operators useeq=andscale=filters which handle multi-comma expressions correctly. Ken Burns intensity-during-emphasis is on the future-enhancement list; the visible effect is mostly carried by beat-sync flashes, brightness pulse, chorus pulse, and zoom bumps. - Overlay opacity is static —
grain_overlay,leaks_overlay,dust_overlayuse ffmpeg’sblendfilter which requires a literalall_opacity=value. Overlays form the ambient hum that doesn’t swing with emphasis.
No UI in v1
There’s no review or override panel for v1 — the LLM picks emphasis lines once at analyze time, and the result drives the next render. Diagnostic logs ([emphasis] cached <N> intervals for <song>) let you confirm what was selected.