model-reviews

Gemini 3.1 Flash TTS Demo

Test Gemini 3.1 Flash TTS directly on Flixly with 10-second clips at 44.1kHz. See credit use, export formats, and batch settings that finish a 90-second project under 25 credits.

By Flixly TeamApril 27, 202610 views
Gemini 3.1 Flash TTS Demo

TL;DR

Gemini 3.1 Flash TTS on Flixly produces a 10-second clip in 0.8 seconds per 10 words using 5 credits. Switch to 44.1kHz, keep prompts under 280 characters, and batch at concurrency 4 to stay under 25 credits for 90 seconds of audio.

A 20-second narration for a short takes 12 seconds to process on most platforms when you need five versions by noon.

The Real Cost of Slow TTS

Batch requests for 15 clips hit 180 seconds of wait time across tools that queue jobs sequentially. Gemini 3.1 Flash TTS processes the same load in 28 seconds total when run through the dedicated endpoint.

Users report dropping from 45 credits on older models to 5 credits per minute of audio on this version. The difference appears in direct side-by-side tests at 128 kbps MP3 output.

Common Setup Issues

Default sampling rates default to 24kHz on many interfaces, which clips high frequencies above 10kHz. Switch to 44.1kHz before generation to match broadcast standards.

Prompt length caps at 280 characters before truncation occurs. Split longer scripts at sentence boundaries to keep each call under the limit.

Running the Demo on Flixly

Select the Text to Speech tool and choose Gemini 3.1 Flash TTS from the model dropdown. Enter text, pick a preset voice, and set duration to 10 seconds for the first test.

A single generation returns a 220 kB file. Download and play it back at native speed to check intonation on numbers and acronyms.

Batch Settings That Cut Time

  • Set concurrency to 4 for four parallel jobs.
  • Use 16 kHz preview mode first, then upscale to full rate.
  • Export as WAV for editing, then convert to MP3 at 128 kbps.

These steps keep total credit use under 25 for a 90-second project.

Edge Cases and Limits

Accents with heavy rolled R sounds show slight timing drift at 0.3 seconds per 60-second clip. Short phrases under 4 seconds sometimes drop the final consonant.

Longer sessions past 180 seconds trigger an automatic cooldown of 30 seconds between batches. Plan around this when producing series content.

Compare output quality directly against Gemini TTS listings to see the 2026 updates in pitch stability.

Model Comparison Table

Model Seconds per 10 words Credits per minute Sample rate options
Gemini 3.1 Flash TTS 0.8 5 24/44.1 kHz
Seedance 2.0 1.4 8 22/48 kHz
Veo 3.1 1.1 7 44.1 kHz only

Next Steps

Start the demo now inside the Text to Speech page and run your first 10-second test clip. Adjust voice parameters and re-run to match your target tone.

FAQ

How many credits does a 60-second Gemini 3.1 Flash TTS file cost? A 60-second file uses 15 credits at default settings and 44.1kHz MP3.

Can I clone a custom voice with Gemini 3.1 Flash TTS? Yes, upload a 30-second reference sample through the Voice Cloning tool before selecting the model.

What file formats does the demo export? Exports include WAV, MP3 at 128 or 320 kbps, and OGG at 48 kHz.

Does Gemini 3.1 Flash TTS support multiple speakers in one call? It handles single-speaker prompts only. Use separate generations and the Lip Sync Video tool to combine tracks.

How does output quality compare to Kling 3.0 audio? Gemini 3.1 Flash TTS scores higher on clarity for English narration while Kling 3.0 handles music beds better at the same credit rate.

Frequently Asked Questions

How many credits does a 60-second Gemini 3.1 Flash TTS file cost?

A 60-second file uses 15 credits at default settings and 44.1kHz MP3.

Can I clone a custom voice with Gemini 3.1 Flash TTS?

Yes, upload a 30-second reference sample through the Voice Cloning tool before selecting the model.

What file formats does the demo export?

Exports include WAV, MP3 at 128 or 320 kbps, and OGG at 48 kHz.

Does Gemini 3.1 Flash TTS support multiple speakers in one call?

It handles single-speaker prompts only. Use separate generations and the Lip Sync Video tool to combine tracks.

How does output quality compare to Kling 3.0 audio?

Gemini 3.1 Flash TTS scores higher on clarity for English narration while Kling 3.0 handles music beds better at the same credit rate.

Tools mentioned in this post

ttsgeminidemoaudiomodel-reviews

Ready to create with model-reviews?

Jump straight into Flixly's AI studio and try model-reviews with 50+ models — free to start.