Has anyone successfully built a web-based voice call experience (not push-to-talk) with natural speech, silence detection, and barge-in using browser audio + AI?

We’re building a call-style UI where:

The user can talk freely (like a real phone call)
Silence detection determines when a “turn” ends
Short pauses are merged into one thought
The AI can be interrupted if the user starts talking
Audio playback and mic capture work reliably on iOS Safari

Right now we’re running into issues where:

Silence detection doesn’t reliably stop listening
Turns fire too early or too late
Transcription sometimes fails or never triggers
iOS Safari adds extra constraints around audio unlock and playback

If you’ve solved this (or seen a solid pattern for frontend VAD + turn management in the browser), I’d love to hear:

What approach worked for you
Any gotchas with MediaRecorder / Web Audio API
Whether you moved logic frontend vs backend

Appreciate any war stories or architecture advice 🙏

0 comments

AI Prompting Basic Training

skool.com/ai-professor-5434

ABC's and 123's of AI.

Old school learning with lessons & practice exercises

You will learn how to get great results from your prompts

AI Automation Society Plus

Bring people together around your passion and get paid.