Seedance 2.0 is a new AI video model from the Dreamina/CapCut side of ByteDance that’s focused on one thing most video models struggle with
consistency across shots
Instead of only text-to-video, it supports multimodal references
• text
• images
• video clips
• audio clips
Dreamina says you can stack up to 12 clips in one project (9 images, 3 videos, 3 audio) and video/audio refs can be up to 15 seconds
so you can guide the model with real examples, not vibes
What this unlocks for creators
• the same character staying stable across multiple shots
• smoother scene transitions and camera switches
• better audio + visuals lining up like an edited sequence
Check out the video I've attached👇