You need voiceovers for 12 short clips from a 4-minute script and the deadline is tonight. Start at the Text to Speech page to confirm available models before touching any code.

Set up your account and credits

Create an account at the sign-up page. Buy a starter pack of 500 credits. Each TTS generation of 30 seconds costs about 8 credits when using Gemini 3.1 Flash TTS. Check your balance in the dashboard before every batch run.

Choose the right model

Flixly lists Gemini 3.1 Flash TTS, Seedance 2.0 audio tracks, and Kling 3.0 voice options. Gemini 3.1 Flash TTS handles English at 24 kHz with low latency. Test a single line first to compare output quality against your script tone.

Build the request payload

Prepare a JSON body with text, voice_id, speed, and format fields. Speed accepts values from 0.8 to 1.3. Format supports mp3 or wav at 16 kHz or 24 kHz. Keep the text under 3000 characters per call to stay inside credit limits.

Send the API call

Use your API key from the dashboard settings. Post to the endpoint with the payload. The response returns a job_id and estimated credits. Poll the status endpoint every 4 seconds until the file URL appears.

Verify and download the file

Listen to the first 10 seconds for pronunciation errors. If speed feels off, adjust and rerun. Download the wav file at 24 kHz for editing in your video tool. Store the file URL for 48 hours before it expires.

Step-by-step integration

Log in and note your API key from account settings. The key stays valid for 90 days.
Open the Text to Speech tool page and copy the exact model string for Gemini 3.1 Flash TTS.
Write a test script of 45 words and count characters. This keeps the call small while checking latency.
Build the JSON payload with text, model, and output format set to wav. Save it as a local file for reuse.
Send the POST request using curl or your language of choice. Capture the job_id from the response body.
Poll the status URL with the job_id until the state changes to complete. Expect 6 to 12 seconds for a 20-second clip.
Download the audio file and play it back. Note the exact credit cost shown in the response header.
Repeat the call with updated speed or voice settings until the delivery matches your script.

Parameter reference table

Field	Type	Example value	Notes
text	string	"Hello world today"	Max 3000 chars per request
model	string	gemini-3.1-flash-tts	Use exact string from dashboard
speed	float	1.1	Range 0.8 to 1.3
format	string	wav	16kHz or 24kHz options
voice_id	string	clone_482	Optional when using cloned voice

Combine with other tools

After TTS generation, feed the audio into Lip Sync Video for character mouth movement. The same credit balance works across tools. Link the resulting video to Shorts Generator if you need vertical crops at 9:16. Each step logs its own credit use so you can track costs per clip.

Handle errors

If the response shows insufficient credits, buy more before the next batch. If pronunciation fails on a name, add phonetic spelling inside brackets in the text field. The system returns an error code 422 for format mismatches. Fix the payload and retry the same job_id within 10 minutes.

Scale to production

Store the API key in an environment variable. Run loops that split long scripts into 250-word chunks. Each chunk generates in parallel when you space calls by 2 seconds. Monitor total credits used against your monthly budget in the dashboard.

Related audio options

Test Voice Cloning to match an existing narrator. The clone file uploads once and then appears as a selectable voice_id in TTS calls. Pair the output with Music Generation at low volume for background tracks under 10 seconds.

You now hold a working script that turns text into timed audio files on demand. Run the same flow again at Text to Speech whenever new copy arrives.

How to API Integrate AI TTS

Set up your account and credits

Choose the right model

Build the request payload

Send the API call

Verify and download the file

Step-by-step integration

Parameter reference table

Combine with other tools

Handle errors

Scale to production

Related audio options

Tools mentioned in this post

Related Articles

How to Make Reels with AI

How to make invitation videos with AI

Auto Captions for Silent Films 2026

AI 3D Model Generator from Images Guide

Explore more on Flixly

Ready to create with tutorials?