Experts Advice Needed on my Pipecat Architecture · Open Source Voice AI Community

Mohammad Mussab

Dec '25 (edited) • Pipecat

Experts Advice Needed on my Pipecat Architecture

𝗛𝗲𝗮𝗹𝘁𝗵𝗰𝗮𝗿𝗲 𝗩𝗼𝗶𝗰𝗲 𝗔𝗴𝗲𝗻𝘁 𝗔𝗿𝗰𝗵𝗶𝘁𝗲𝗰𝘁𝘂𝗿𝗲 𝗥𝗲𝘃𝗶𝗲𝘄

Hi everyone,

Running a production voice agent (~500-600 calls/day) with 𝗽𝗶𝗽𝗲𝗰𝗮𝘁-𝗳𝗹𝗼𝘄𝘀. Would appreciate feedback on my architecture.

𝗪𝗵𝘆 𝗦𝗲𝗹𝗳-𝗛𝗼𝘀𝘁𝗲𝗱: Tried Pipecat Cloud but Talkdesk is not supported. WebSocket is mandatory - cannot use WebRTC.

𝗔𝗿𝗰𝗵𝗶𝘁𝗲𝗰𝘁𝘂𝗿𝗲:

Talkdesk ──WS──► Bridge Server (Azure App Service) ──WS──► Pipecat Agent (Azure VM + Docker)

• Bridge converts μ-law 8kHz ↔ PCM 16kHz (resampling on every chunk)

• 3 Docker containers behind Nginx load balancer

• Each handles ~15 concurrent calls ──► Each container: 3GB RAM, 0.75 CPU limit

• CI/CD: GitHub Actions → Docker Hub → Azure VM pull

𝗔𝗜 𝗦𝘁𝗮𝗰𝗸:

• STT: Azure Speech (Italian)

• LLM: OpenAI GPT-4.1

• TTS: ElevenLabs (eleven_multilingual_v2)

• VAD: Silero

𝗠𝘂𝗹𝘁𝗶-𝗔𝗴𝗲𝗻𝘁 𝗦𝗲𝘁𝘂𝗽 (pipecat-flows):

Router Node → detects intent → routes to:

• Booking Agent (20+ step flow)

• Info Agent (RAG/knowledge base)

• [Future] Person specify the doctors name e.g "I want to book appointment with Dr. Jhon for heart checkup." Doctor Booking Agent

Agents can transfer between each other during conversation.

𝗠𝘆 𝗤𝘂𝗲𝘀𝘁𝗶𝗼𝗻𝘀:

𝟭. 𝗟𝗮𝘁𝗲𝗻𝗰𝘆 feels high. Is the two-hop WebSocket architecture (Talkdesk → Bridge → Pipecat) causing this? Should I merge the bridge into the Pipecat container?

𝟮. Is having a 𝘀𝗲𝗽𝗮𝗿𝗮𝘁𝗲 𝗯𝗿𝗶𝗱𝗴𝗲 for audio conversion a common pattern, or is there a better approach?

𝟯. 𝗥𝗼𝘂𝘁𝗶𝗻𝗴 𝗽𝗮𝘁𝘁𝗲𝗿𝗻 𝗾𝘂𝗲𝘀𝘁𝗶𝗼𝗻: I use a Router node to detect intent and route to agents. But I'm concerned this approach is too rigid.

Example: Currently I route to "Booking Agent" when user says "book X-ray". But what if user says "book with Dr. Jhon" or "book with Dr. Jhon at 3pm tomorrow"?

Should I create separate agents for each variation? That feels wrong - they're all booking, just with different pre-filled data.

Or should the Router extract entities (doctor name, time, service) and pass them as parameters to a single flexible agent that skips steps dynamically?

What's the best pattern in pipecat-flows for handling these variations without creating rigid, bounded flows for each request type?

𝟰. What are you using for 𝗼𝗯𝘀𝗲𝗿𝘃𝗮𝗯𝗶𝗹𝗶𝘁𝘆 in production?

Any feedback appreciated. Thanks!

4 comments

Open Source Voice AI Community

skool.com/open-source-voice-ai-community-6088

Voice AI made open: Learn to build voice agents with Livekit & Pipecat and uncover what the closed platforms are hiding.

AI Automations by Jack

AI Content Creators

Story Hacker Silver

AI Content Creation Community

Bring people together around your passion and get paid.