guides

AI Podcast Generator: Text to Podcast

Step-by-step guide to turning a text script into a finished podcast using Flixly tools. Covers Gemini 3.1 Flash TTS, voice cloning, music layers, and export checks with exact credit costs.

By Flixly TeamMarch 26, 202618 views
AI Podcast Generator: Text to Podcast

TL;DR

Start with a 45-minute script. Use Gemini 3.1 Flash TTS at 0.8 credits per minute for narration, clone voices for recurring characters, layer 90 bpm music at -18 dB, then export 320 kbps MP3 plus SRT captions. Verify length and bitrate before upload.

You have a 45-minute script ready and two hours until upload. The goal is a finished podcast episode with consistent voices, background music, and captions.

Set up the project in the dashboard

Open the Flixly dashboard and create a new project folder. Name it after the episode date so files stay organized. Import the full script as a plain text file. This keeps every line available for later edits without retyping.

Generate the main narration track

Select the Text to Speech tool. Choose Gemini 3.1 Flash TTS because it handles long-form narration at 0.8 credits per minute. Paste the first 10 minutes of script. Set the voice to a neutral male host preset and generate. The output arrives as a 48 kHz WAV file.

Repeat the process for the second voice. Switch to a female guest preset and process the next section. Each segment lands in the project library with automatic timestamps.

Check audio quality

Play the first clip in the built-in player. Listen for breathing pauses and emphasis on numbers. If a line sounds flat, regenerate only that sentence with a 0.2 speed adjustment. This keeps total credits under 12 for a 45-minute show.

Add voice cloning for recurring characters

When the episode needs a returning character, open Voice Cloning. Upload three clean 30-second samples from previous episodes. The model trains in under four minutes and produces a new preset named after the character.

Apply the cloned voice to any lines tagged with that name. The system matches timing automatically so the cloned track aligns with the main narration.

Layer music and sound effects

Move to the Music Generation tool. Enter a prompt like "low-tempo electronic underscore, 90 bpm, no vocals". Generate three 60-second loops. Drop the chosen loop under the narration at -18 dB. Trim start and end points so the music fades exactly at scene changes.

Export and verify the final file

Combine all tracks in the project editor. Add Auto Captions to create a subtitle track in SRT format. Export the full episode as a 320 kbps MP3 plus the separate WAV stems. The final file lands at 62 MB for a 45-minute episode.

Check the result by uploading a 30-second test clip to your host platform. Confirm file length, bitrate, and that captions display correctly. The episode is now ready for scheduled release.

Compare model options for future episodes

Model Credits per minute Max duration Best for
Gemini 3.1 Flash TTS 0.8 60 min Long narration
Seedance 2.0 1.2 30 min Dramatic delivery
Wan 2.7 1.0 45 min Natural conversation

Use the table to pick the right model before starting the next script.

Text to Speech handles the core conversion while Voice Cloning keeps character voices consistent across episodes. Add Music Generation for background layers and finish with Auto Captions for accessibility.

After the first export, open the same project again to tweak any section. The saved presets and cloned voices remain available so the next episode starts from the same point instead of from scratch.

Tools mentioned in this post

ai-podcasttext-to-speechvoice-cloningpodcast-production

Ready to create with guides?

Jump straight into Flixly's AI studio and try guides with 50+ models — free to start.