Seedance 2.0 Does Not Work Like Text-to-Video
Seedance 2.0 requires a reference video for motion transfer rather than text prompts alone. This guide covers exact inputs, credit costs, and when to choose it over Veo 3.1 or Kling 3.0.

TL;DR
Seedance 2.0 transfers motion from a supplied reference clip instead of generating motion from text. Upload a 4-8 second 1080p reference and target image, run at 18 credits, and expect 1080p 24 fps output that matches the reference timing exactly. Pure text prompts return static frames.
Many assume Seedance 2.0 accepts any text prompt the same way Sora 2 or Veo 3.1 do. It does not. Seedance 2.0 requires a reference video clip to transfer motion, timing, and camera moves onto new subjects.
The Reference Requirement
A 4-second 1080p source clip at 24 fps supplies the motion data. The model then applies that motion to a still image or short character sequence. Without the reference file the generation fails or returns static output.
Users reach this tool through the Reference to Video page. The interface asks for both the reference clip and a target image or character sheet.
Why Text Prompts Alone Fail
Pure text input lacks the temporal signal Seedance 2.0 needs. In tests a 12-word prompt such as "person dancing in rain" produced only a frozen frame. Adding a 5-second reference clip of a dancer immediately yielded coherent 24-frame motion at 1080p.
The same limit appears when compared with Text to Video. That tool accepts text alone because its models, including GPT-Image 2.0, generate motion from noise. Seedance 2.0 does not.
Practical Workflow on Flixly
- Upload reference clip under 8 seconds.
- Upload target character or image.
- Select Seedance 2.0 from the model list.
- Set duration to match reference length.
- Run generation.
Each run costs 18 credits for a 5-second 1080p file. Output lands in the dashboard library as an MP4 with embedded audio track if the reference contained sound.
Supported Input Sizes
- Reference: 720p to 1080p, 2-8 seconds
- Target: 512x512 to 1536x1536 PNG or JPG
- Max output: 1080p, 24 fps, up to 8 seconds
Comparison Table
| Model | Needs Reference | Max Duration | Typical Credit Cost | Best For |
|---|---|---|---|---|
| Seedance 2.0 | Yes | 8 s | 18 | Motion transfer |
| Veo 3.1 | No | 10 s | 22 | Text-to-video |
| Kling 3.0 | Optional | 6 s | 15 | Image-to-video |
| Wan 2.7 | No | 12 s | 25 | Long text prompts |
The table shows Seedance 2.0 trades prompt flexibility for precise motion control.
Checking Output Quality
Correct results show the target subject following the exact limb positions and timing of the reference. Camera pans, tilts, and step counts match frame-for-frame. If the subject slides or the motion looks generic, the reference file was too short or low resolution.
When to Switch Models
Choose Image to Video or Video to Video when you have no reference clip. Those tools accept stills or existing video without motion data.
For lip-synced dialogue after motion transfer, route the Seedance 2.0 output into the Lip Sync Video tool. The handoff preserves the generated motion while adding new audio.
Final Mental Model
Treat Seedance 2.0 as a motion applicator, not a motion inventor. Supply the motion first, then the subject. Apply that rule on the Reference to Video page and results stay consistent.
Frequently Asked Questions
Does Seedance 2.0 accept text-only prompts?▾
No. It returns static frames or errors without a reference video clip that supplies motion data.
What file length works best with Seedance 2.0?▾
Clips between 4 and 8 seconds at 1080p and 24 fps give the most reliable motion transfer results.
How many credits does a Seedance 2.0 generation use?▾
A standard 5-second 1080p output costs 18 credits on Flixly.
Can I add dialogue after generating with Seedance 2.0?▾
Yes. Export the file then run it through the lip sync tool to attach new audio while keeping the transferred motion.