Hey guys, been able to smooth out the barge in and interruption capabilities, for a lot of my clients voice agents:
Here’s how:
(1) shorter, smarter TTS chunks and strict stop-speaking rules, and
(2) smaller, better RAG chunks so the LLM answers in shorter bursts.
Do both.
KB via Trieve (BYOD), set chunk size ~200–600 tokens with 15–20% overlap, and prefer heading-based or GPT-4o optimized chunking for coherence.
This yields tighter citations and shorter LLM replies (less rambling = easier interruption).
Hope this helps you guys🚀