29d (edited) โ€ข Pipecat
Experts Advice Needed on my Pipecat Architecture
๐—›๐—ฒ๐—ฎ๐—น๐˜๐—ต๐—ฐ๐—ฎ๐—ฟ๐—ฒ ๐—ฉ๐—ผ๐—ถ๐—ฐ๐—ฒ ๐—”๐—ด๐—ฒ๐—ป๐˜ ๐—”๐—ฟ๐—ฐ๐—ต๐—ถ๐˜๐—ฒ๐—ฐ๐˜๐˜‚๐—ฟ๐—ฒ ๐—ฅ๐—ฒ๐˜ƒ๐—ถ๐—ฒ๐˜„
Hi everyone,
Running a production voice agent (~500-600 calls/day) with ๐—ฝ๐—ถ๐—ฝ๐—ฒ๐—ฐ๐—ฎ๐˜-๐—ณ๐—น๐—ผ๐˜„๐˜€. Would appreciate feedback on my architecture.
๐—ช๐—ต๐˜† ๐—ฆ๐—ฒ๐—น๐—ณ-๐—›๐—ผ๐˜€๐˜๐—ฒ๐—ฑ: Tried Pipecat Cloud but Talkdesk is not supported. WebSocket is mandatory - cannot use WebRTC.
๐—”๐—ฟ๐—ฐ๐—ต๐—ถ๐˜๐—ฒ๐—ฐ๐˜๐˜‚๐—ฟ๐—ฒ:
Talkdesk โ”€โ”€WSโ”€โ”€โ–บ Bridge Server (Azure App Service) โ”€โ”€WSโ”€โ”€โ–บ Pipecat Agent (Azure VM + Docker)
โ€ข Bridge converts ฮผ-law 8kHz โ†” PCM 16kHz (resampling on every chunk)
โ€ข 3 Docker containers behind Nginx load balancer
โ€ข Each handles ~15 concurrent calls โ”€โ”€โ–บ Each container: 3GB RAM, 0.75 CPU limit
โ€ข CI/CD: GitHub Actions โ†’ Docker Hub โ†’ Azure VM pull
๐—”๐—œ ๐—ฆ๐˜๐—ฎ๐—ฐ๐—ธ:
โ€ข STT: Azure Speech (Italian)
โ€ข LLM: OpenAI GPT-4.1
โ€ข TTS: ElevenLabs (eleven_multilingual_v2)
โ€ข VAD: Silero
๐— ๐˜‚๐—น๐˜๐—ถ-๐—”๐—ด๐—ฒ๐—ป๐˜ ๐—ฆ๐—ฒ๐˜๐˜‚๐—ฝ (pipecat-flows):
Router Node โ†’ detects intent โ†’ routes to:
โ€ข Booking Agent (20+ step flow)
โ€ข Info Agent (RAG/knowledge base)
โ€ข [Future] Person specify the doctors name e.g "I want to book appointment with Dr. Jhon for heart checkup." Doctor Booking Agent
Agents can transfer between each other during conversation.
๐— ๐˜† ๐—ค๐˜‚๐—ฒ๐˜€๐˜๐—ถ๐—ผ๐—ป๐˜€:
๐Ÿญ. ๐—Ÿ๐—ฎ๐˜๐—ฒ๐—ป๐—ฐ๐˜† feels high. Is the two-hop WebSocket architecture (Talkdesk โ†’ Bridge โ†’ Pipecat) causing this? Should I merge the bridge into the Pipecat container?
๐Ÿฎ. Is having a ๐˜€๐—ฒ๐—ฝ๐—ฎ๐—ฟ๐—ฎ๐˜๐—ฒ ๐—ฏ๐—ฟ๐—ถ๐—ฑ๐—ด๐—ฒ for audio conversion a common pattern, or is there a better approach?
๐Ÿฏ. ๐—ฅ๐—ผ๐˜‚๐˜๐—ถ๐—ป๐—ด ๐—ฝ๐—ฎ๐˜๐˜๐—ฒ๐—ฟ๐—ป ๐—พ๐˜‚๐—ฒ๐˜€๐˜๐—ถ๐—ผ๐—ป: I use a Router node to detect intent and route to agents. But I'm concerned this approach is too rigid.
Example: Currently I route to "Booking Agent" when user says "book X-ray". But what if user says "book with Dr. Jhon" or "book with Dr. Jhon at 3pm tomorrow"?
Should I create separate agents for each variation? That feels wrong - they're all booking, just with different pre-filled data.
Or should the Router extract entities (doctor name, time, service) and pass them as parameters to a single flexible agent that skips steps dynamically?
What's the best pattern in pipecat-flows for handling these variations without creating rigid, bounded flows for each request type?
๐Ÿฐ. What are you using for ๐—ผ๐—ฏ๐˜€๐—ฒ๐—ฟ๐˜ƒ๐—ฎ๐—ฏ๐—ถ๐—น๐—ถ๐˜๐˜† in production?
Any feedback appreciated. Thanks!
2
4 comments
Mohammad Mussab
3
Experts Advice Needed on my Pipecat Architecture
powered by
Open Source Voice AI Community
skool.com/open-source-voice-ai-community-6088
Voice AI made open: Learn to build voice agents with Livekit & Pipecat and uncover what the closed platforms are hiding.
Build your own community
Bring people together around your passion and get paid.
Powered by