Hey everyone,
I’m building a live AI phone receptionist and I’m facing an issue when the assistant has to repeat numbers back to the caller.
Problems:
• When a caller gives an 11-digit phone number → digits merge or sound unclear
• When repeating prices like £1500 → pronunciation sounds distorted
• Works fine sometimes, but inconsistent on real phone calls
Stack: Vapi + Twilio + n8n + ElevenLabs (also tested Gemini/OpenAI)
Tried already :
– Increasing end-of-turn timeout (0.5 → 2s)
– Changing voices/models/LLMs
How do you normally solve this in production systems?
Is it formatting, TTS settings, buffering, or another approach?