Resemble AI just dropped Chatterbox Turbo and it's legitimately impressive.
What it is:
A fully open-source, MIT-licensed text-to-speech model that benchmarks ahead of ElevenLabs Turbo and Cartesia Sonic 3.
Why you should care:
- <150ms time-to-first-sound — fast enough for real-time applications, at least in English
- Voice cloning from just 5 seconds of audio
-- no lengthy training datasets needed
- Paralinguistic tags — control laughs, pauses, breaths for natural human expression
- MIT license — use it commercially, fork it, do whatever you want!!
The model is designed to be transparent and auditable, which matters if you're building anything that needs to prove authenticity or pass compliance checks.
If you've been looking for a serious open-source alternative to the paid voice APIs, this is worth testing. The 5-second cloning alone makes it interesting for rapid prototyping.
Anyone already running this locally? Curious about real-world latency and quality compared to the benchmarks.
Share your ideas and results