AI Music Generator: Beats from Text in 2026
Follow the exact sequence to turn a 45-word script into a 60-second custom beat. The walkthrough covers prompt writing, model choice, credit costs, voice layering, and final export inside Flixly.
TL;DR
Open the Music Generation page, enter a 120-character prompt, run three 60-second variants at 12 credits, refine the middle section, then layer cloned voice from Text to Speech and Voice Cloning tools before exporting the synced 1080p file.
Your client needs a 60-second promo video with custom music that fits a 45-word script about a new fitness app. You open Flixly at 9:15 a.m.
Open the dashboard and locate the tool
Sign in at /auth/register if you have not already. Credits sit in the top bar. Click through to the Music Generation page from the main dashboard menu. The page loads with a text box, model selector, and duration slider set to 30 seconds by default.
Write the prompt with concrete constraints
Enter a prompt of 120 characters: "upbeat electronic beat, 128 bpm, clean kick and snare, no vocals, 60 seconds, fitness ad energy". Keep the description under 150 characters for faster queue times. Add style tags such as "lo-fi house" or "trap hi-hats" only when you need them.
Choose duration and format
Set the length slider to 60 seconds. Choose MP3 at 320 kbps. The system shows an estimated credit cost of 12 credits before you generate.
Pick a model and run the first pass
Select the 2026 audio stack. Run three variants at once. The first variant finishes in 22 seconds and returns a 60-second file. The second adds brighter synths. The third tightens the bass drop at the 35-second mark.
Verify timing and energy
Play each file against the 45-word script. Check that the kick lands on the word "launch" at second 18. Adjust the prompt by adding "emphasize second 18" and regenerate only the middle section. This costs an extra 4 credits.
Layer with text-to-speech if the brief calls for voice
If the script needs narration, move to the Text to Speech page. Paste the 45-word script, choose Gemini 3.1 Flash TTS, and generate a 12-second voice clip. Download both files.
Add voice cloning for consistent talent
Return to the Voice Cloning page. Upload a 30-second reference of the same narrator from a prior project. Generate a new line that matches the cloned timbre. The system reports 98 percent similarity score.
Sync to video in the shorts workflow
Open the Shorts Generator. Upload the 60-second beat and the 12-second voice clip. The timeline auto-aligns the voice at second 18. Export the combined 1080p file at 24 fps.
Export and check file specs
The final MP3 is 2.1 MB. The video export is 48 MB. Both files carry embedded metadata with the prompt used. Store the project link for reuse.
Repeat the workflow on the next brief
You now own a reusable 60-second beat plus voice layer that matches the fitness app script exactly. Run the same prompt template on the Music Generation page for future ads.
| Step | Action | Time | Credits | Output |
|---|---|---|---|---|
| 1 | Write 120-char prompt | 2 min | 0 | Text ready |
| 2 | Generate 3 variants | 22 s | 12 | 60 s MP3s |
| 3 | Refine middle section | 45 s | 4 | Updated beat |
| 4 | Clone voice line | 18 s | 6 | 12 s voice |
| 5 | Sync in shorts | 3 min | 8 | 1080p video |
The process from brief to final export took 14 minutes and 30 credits. The same sequence works for any 45- to 90-second ad or social clip.



