Hi everyone — I’m looking for advice on how to support multiple languages (at least English, Cantonese, and Mandarin) in a voice agent.
I’m currently building a healthcare voice agent demo and I’ve been testing it with Vapi, but the Cantonese (zh-HK) speaking quality is still very poor. I’ve tried several different LLMs and configurations, but I’m not getting good results. Since both Cantonese and Mandarin are chinese language, so the Voice Agent always mess them up.
What confuses me is that I’ve tested other online demos, and their Cantonese voice agents sound very natural and fluent, so I believe this should be achievable — I just haven’t found the right approach yet.
If anyone has recommendations on:
- the best TTS / voice model for Cantonese
- the best setup / pipeline for multi-language voice agents
I’d really appreciate your guidance. Thanks in advance!