Skip to content

AI Lyric Generator

The AI Lyric Generator is an opt-in feature that helps you build a music brand once, then spin up scene-anchored lyrics and a ready-to-paste Suno style prompt for every new song. All generation runs locally on an LLM you download to your own machine — nothing is sent to the cloud.

Lyrics wizard inside the Create flow

How it fits into the app

The generator isn’t a separate tab — it’s baked into the Create flow. When the feature is enabled, two extra steps appear in front of the usual song-drop step:

  1. Lyrics — enter a topic, review and accept the generated scene, and the generator returns one polished song (draft → best-pick → revise → expand runs internally).
  2. Song — your chosen title, style prompt, and lyrics are shown ready to copy into Suno (or your DAW of choice). Once you have the audio file back, you drop it in and continue the normal render flow.

Disabling the feature in Settings hides these steps and does not delete any brand or model data.

Enabling the feature

  1. Open Settings → Preferences → Enable features.
  2. Toggle AI Lyric Generator on.
  3. Open the new Settings → Lyrics tab to pick a model and fill in your brand.
  4. Go to Create — you’ll start on the Lyrics step instead of the drop zone.

Model tiers

Settings → Lyrics shows four model options. The right pick depends on your machine.

TierModelSizeMin RAMRecommendedBest for
LightQwen 2.5 7B Instruct~4.7 GB8 GB12 GBLow-spec laptops, older Macs
BalancedQwen 2.5 14B Instruct~9 GB16 GB24 GBDefault for 16 GB+ machines
BestMistral Small Instruct 2409~13 GB24 GB32 GB24+ GB Apple Silicon, high-VRAM PCs
UncensoredCydonia 22B v2q~13 GB24 GB32 GBSame as Best, when you need direct language on explicit topics

The app detects your system RAM and suggests a default tier. Models that your machine may not run well are flagged but still selectable — you can override the recommendation if you know what you’re doing.

Downloading a model

  1. Click Download on the tier you want. The file is fetched directly from HuggingFace and streamed to disk under <app data>/models/llm/.
  2. A progress bar appears on the card showing percentage complete. You can click Cancel at any time to abort the download; partial files are cleaned up automatically.
  3. When the download finishes, the card gains an Installed badge and a Delete button. Installed models stay on disk across app restarts.
  4. You can install multiple models side-by-side — useful if you want to switch between, say, the Balanced model for most songs and the Uncensored model for specific lanes.

If a download fails (e.g. network drops mid-stream), the card keeps the error message visible so you know why — just click Download again to retry.

Deleting a model shows a Deleting… indicator on the card while the file is removed. You can re-download later from the same card to reclaim the slot.

HuggingFace token (optional)

A HuggingFace token isn’t required for the catalog models, but pasting one in helps:

  • It bypasses the anonymous-rate limit on large GGUF downloads (helpful for the 13 GB Cydonia file on flaky connections).
  • Some HuggingFace repos require accepting a license before they’ll serve the file. If a download fails with a “gated” or “forbidden” error, visit the model page (e.g. TheDrummer/Cydonia-22B-v1.3-GGUF), click Agree and access repository, then retry — with or without a token.

A HuggingFace account and access token are free; see HuggingFace’s User access tokens docs for the walkthrough. Create a read-only token at huggingface.co/settings/tokens and paste it into the HuggingFace access token field at the top of the Lyrics settings tab. It’s saved in config.json and only sent as a Bearer header on model downloads.

Using a model you already downloaded

If you already have one of the catalog GGUFs on disk (from LM Studio, HuggingFace CLI, ~/Downloads, etc.), the card will show a Select existing button with the path. Clicking it links the file into the app’s models dir (no re-download). Searched locations: ~/models, ~/Downloads, ~/.cache/huggingface/hub, ~/.lmstudio/models, and the GPT4All app-data folder. Filename matching is case-insensitive.

Setting up your brand

Below the model picker is the Your Brand form. This is where you tell the generator who you are as an artist. The brand drives every song — genre, vocal style, attitude, what to avoid. You fill this in once and the per-song form stays minimal (just the topic).

Required fields:

  • Artist name — your stage name.
  • Genre — pick a preset lane or choose Custom… to write your own short genre description.
  • Vocal type — female, male, or duet.
  • Your Sound — a short character bio for the narrator, not a genre description. 2–4 sentences is plenty. Name the perspective (who they are, what role they play), the attitude (what they feel toward the subject), and the lens (what they notice or know that others don’t). Concrete traits beat vague adjectives — the LLM uses this to keep the narrator sounding like a consistent person across every song.
  • Explicit content — controls how direct the lyrics can get:
    • Off — never produce profanity or sexual language, even if the topic invites it. Safe for family-friendly channels.
    • Contextual (recommended) — let the topic decide. Mature topics can earn one or two strong words where it serves the scene; light topics stay clean.
    • On — allow direct language freely when it fits. Best paired with the Uncensored model tier; aligned models will still self-censor regardless.

Advanced (optional):

  • Themes you gravitate toward — emotional territories the brand rotates through. Used to keep songs from repeating the same emotional beat.
  • Never use — words, images, or tropes you want the generator to avoid.
  • Always prefer — detail types you want leaned on (proper nouns, specific times, body-level detail, etc.).
  • Artist references — artists or songs you want to sound adjacent to. Used as inspiration, not imitation.

All fields auto-save as you type. You can edit or fully replace the brand any time. For a field-by-field deep dive with examples — including the Vocal Persona / Suno-drift workflow — see Fine-tune your brand.

Using it in Create

With the feature enabled, a model installed, and a brand defined, the Create tab opens on the Lyrics step:

  1. Enter a topic for the song.
  2. The generator produces a scene (setting, POV, emotional arc). Accept it, edit any field inline, refine it with a short instruction, or regenerate for a fresh one.
  3. Click Accept & write lyrics. The generator runs the full song pipeline — outlining, drafting, picking the strongest draft, revising, and expanding if needed — and returns one polished song.
  4. The Song step shows the finished lyrics and a Suno-ready style prompt with Copy buttons. Paste them into Suno, generate the audio, download the MP3, and drop it into the next step to render the video. Not happy with the song? Click Generate another to run the pipeline again.

The style prompt is now tuned to the song’s actual lyrics, not just your brand’s defaults. Your brand’s genre and vocal type stay anchored in the first 2–3 terms so every track still reads as the same artist — but tempo, instrumentation, production, and mood follow what the song is doing. A confrontational anthem and a quiet ballad from the same brand will share a vocal lane and otherwise read distinctly, so Suno gives back two genuinely different songs instead of two near-clones.

If your brand’s Music Mode is set to Local audio generation (HeartMuLa), the Song step (on the Song page in the sidebar — split out from the legacy unified Create surface) exposes an Add audio to queue button. Clicking it drops a job into the queue and returns immediately — you can keep working: start another song, edit lyrics, browse the Library — while the generation runs in the background.

Audio gen and video render share one GPU lock: only one job runs at a time. If a render is in flight when you queue audio, the audio job waits in line; same in reverse.

Where you see your queued job:

  • Queue tab — every audio_gen row shows a compact card with title, current stage, and progress bar. A song-render in flight uses the existing larger card; audio_gen rows are visually lighter so the two kinds aren’t confused.
  • OS notifications — fire when each take is queued (“Audio queued — …”), when it finishes (“Audio ready — <title> is in Library → Audio.”), and on failure (with the specific reason).
  • Library → Audio sub-tab — finished MP3s land here. Listen, download, send to the Video tab via the “Render video” button, or delete.

If GPU memory runs out mid-generation (common on 32 GB Macs with longer lyrics), the worker automatically retries on CPU. You’ll see a “retrying on CPU” notification. The CPU run takes ~30 min instead of ~12 but always completes — no need to relaunch or shorten anything.

The generated file is an MP3 (not WAV) with embedded ID3v2 tags: title, artist=ai-mvg, genre=<your style prompt>, a comment field reading “Created by ai-mvg using HeartMuLa-oss-3B”, and the full lyrics in a lyrics-eng USLT frame. The lyrics frame is what the per-act lyrics view reads back during analysis — no sidecar .lyrics.txt needed.

Brand history keeps the generator from repeating hooks, settings, and objects across songs in the same brand.

Why local?

Lyric generation benefits from a model that can be steered with explicit creative direction — including mature or R-rated topics, depending on your brand — and that’s easier to do with a model running on your own machine than with a cloud API. It also means no per-song cost, no rate limits, and your song ideas stay on your device.

The tradeoff is a one-time model download (typically 5–13 GB depending on quality tier) and meaningful RAM and GPU requirements. The Settings flow will show you which tier fits your machine before any download begins.

See also: Background generator