When voice AI stops waiting for its turn
Most voice systems still behave like polite interns. They wait for you to finish. They think. Then they respond - slightly late, slightly stiff. Repo: https://github.com/NVIDIA/personaplex Weights: https://huggingface.co/nvidia/personaplex-7b-v1 NVIDIAâs PersonaPlex-7B quietly steps away from that pattern. Instead of chaining ASR â LLM â TTS, it runs on continuous audio tokens, listening and speaking at the same time. A dual-stream transformer generating text and audio in parallel. That design choice matters more than the model size. Theyâre overlapping, interruptible, full of back-channels and timing cues we barely notice - until theyâre missing. Whatâs interesting isnât just that itâs open-weight and MIT-licensed. Itâs that persona control is zero-shot, steered by prompts rather than fine-tuning - suggesting voice behavior might finally be treated as a runtime property, not a training artifact. Whether this feels âhumanâ at scale will probably come down to deployment reality: latency budgets, streaming infrastructure, edge vs cloud trade-offs. But the direction is clear. The biggest limitation of voice AI may no longer be intelligence :- It may be how long we force it to stay silent before speaking.