🎙️ How Can I Create My Own Voice AI Agent (Like VAPI) but for My Language?
Hey everyone 👋 I’m trying to figure out how to build my own voice AI agent, something similar to VAPI Retell..., but custom for my language and accents. We have over 40 accents, and I want to fine-tune a model that can actually understand and speak like people from each region — not just transcribe words, but really adapt to local pronunciation, tone, and vocabulary. I have experience in full-stack development (React, Spring Boot, Go, etc.), but I’ve never done fine-tuning or building voice pipelines before. So my main questions are: - 🧠 What’s the best way to start building a custom voice AI like this? - 🗣️ How can I collect and fine-tune on many accents effectively? - ⚙️ What models, tools, or frameworks do you recommend for speech-to-text, text-to-speech, and conversation handling? - 💾 Do I need to build my own dataset, or can I use / adapt open ones? If anyone here has worked on custom speech AI or VAPI-like systems, I’d really appreciate some pointers 🙏 Thanks in advance!— Khaled Rouissi 🚀