Experts Advice Needed on my Pipecat Architecture · Open Source Voice AI Community

Mohammad Mussab

Dec '25 (edited) • Pipecat

Experts Advice Needed on my Pipecat Architecture

𝗛𝗲𝗮𝗹𝘁𝗵𝗰𝗮𝗿𝗲 𝗩𝗼𝗶𝗰𝗲 𝗔𝗴𝗲𝗻𝘁 𝗔𝗿𝗰𝗵𝗶𝘁𝗲𝗰𝘁𝘂𝗿𝗲 𝗥𝗲𝘃𝗶𝗲𝘄

Hi everyone,

Running a production voice agent (~500-600 calls/day) with 𝗽𝗶𝗽𝗲𝗰𝗮𝘁-𝗳𝗹𝗼𝘄𝘀. Would appreciate feedback on my architecture.

𝗪𝗵𝘆 𝗦𝗲𝗹𝗳-𝗛𝗼𝘀𝘁𝗲𝗱: Tried Pipecat Cloud but Talkdesk is not supported. WebSocket is mandatory - cannot use WebRTC.

𝗔𝗿𝗰𝗵𝗶𝘁𝗲𝗰𝘁𝘂𝗿𝗲:

Talkdesk ──WS──► Bridge Server (Azure App Service) ──WS──► Pipecat Agent (Azure VM + Docker)

• Bridge converts μ-law 8kHz ↔ PCM 16kHz (resampling on every chunk)

• 3 Docker containers behind Nginx load balancer

• Each handles ~15 concurrent calls ──► Each container: 3GB RAM, 0.75 CPU limit

• CI/CD: GitHub Actions → Docker Hub → Azure VM pull

𝗔𝗜 𝗦𝘁𝗮𝗰𝗸:

• STT: Azure Speech (Italian)

• LLM: OpenAI GPT-4.1

• TTS: ElevenLabs (eleven_multilingual_v2)

• VAD: Silero

𝗠𝘂𝗹𝘁𝗶-𝗔𝗴𝗲𝗻𝘁 𝗦𝗲𝘁𝘂𝗽 (pipecat-flows):

Router Node → detects intent → routes to:

• Booking Agent (20+ step flow)

• Info Agent (RAG/knowledge base)

• [Future] Person specify the doctors name e.g "I want to book appointment with Dr. Jhon for heart checkup." Doctor Booking Agent

Agents can transfer between each other during conversation.

𝗠𝘆 𝗤𝘂𝗲𝘀𝘁𝗶𝗼𝗻𝘀:

𝟭. 𝗟𝗮𝘁𝗲𝗻𝗰𝘆 feels high. Is the two-hop WebSocket architecture (Talkdesk → Bridge → Pipecat) causing this? Should I merge the bridge into the Pipecat container?

𝟮. Is having a 𝘀𝗲𝗽𝗮𝗿𝗮𝘁𝗲 𝗯𝗿𝗶𝗱𝗴𝗲 for audio conversion a common pattern, or is there a better approach?

𝟯. 𝗥𝗼𝘂𝘁𝗶𝗻𝗴 𝗽𝗮𝘁𝘁𝗲𝗿𝗻 𝗾𝘂𝗲𝘀𝘁𝗶𝗼𝗻: I use a Router node to detect intent and route to agents. But I'm concerned this approach is too rigid.

Example: Currently I route to "Booking Agent" when user says "book X-ray". But what if user says "book with Dr. Jhon" or "book with Dr. Jhon at 3pm tomorrow"?

Should I create separate agents for each variation? That feels wrong - they're all booking, just with different pre-filled data.

Or should the Router extract entities (doctor name, time, service) and pass them as parameters to a single flexible agent that skips steps dynamically?

What's the best pattern in pipecat-flows for handling these variations without creating rigid, bounded flows for each request type?

𝟰. What are you using for 𝗼𝗯𝘀𝗲𝗿𝘃𝗮𝗯𝗶𝗹𝗶𝘁𝘆 in production?

Any feedback appreciated. Thanks!

4 comments

Open Source Voice AI Community

skool.com/open-source-voice-ai-community-6088

Voice AI made open: Learn to build voice agents with Livekit & Pipecat and uncover what the closed platforms are hiding.

Ashish Builds Academy – Lite

The Cyber Range

The AI Advantage

La Tribu Divisual

Bring people together around your passion and get paid.