Activity
Mon
Wed
Fri
Sun
Feb
Mar
Apr
May
Jun
Jul
Aug
Sep
Oct
Nov
Dec
Jan
What is this?
Less
More

Memberships

Open Source Voice AI Community

804 members โ€ข Free

Brendan's AI Community

23.1k members โ€ข Free

37 contributions to Open Source Voice AI Community
Small AI Voice Agents Questionnaire
Hello all, I'm trying to investigate a few hypothesis I have regarding the AI Voice Agent market. My questions are mostly related to security, observability, billing and load management. In order to do so, I've built the following Google Form: https://forms.gle/oFeM9J9WV9DRX9267 If you could please answer it, I would highly appreciate it - also, once I have all the data compiled - I will publish a post with all my findings, so that people can learn from this study as well. Much Appreciated.
Musings about Vibe Coding, Pipecat, LiveKit and more
So, over the past few weeks - I've been neck deep into working with PIpecat, LiveKit and Vibe Coding. Mainly, I wanted to see what kind of milage I can get from Vibe Coding tools, and in order to test it - what's a better way than build a Pipecat/LiveKit implementation? So, I decided to examine 3 primary tools: - Claude Code - Using Sonnet 3.5 (using CLI) - OpenCode - Grok Code Fast 1 - Google Antigravity - Using Gemini 2.5 Below are my conclusions, split into several categories. ๐Ÿ’ต Financials: Most expensive to use - Claude Code Least expensive to use - OpenCode ๐Ÿ˜ก Developer Experience: Best experience - Google Antigravity Worst experience - Claude Code ๐Ÿ’ช Reliability: Most reliable - Claude Code Least reliable - OpenCode ๐Ÿš… Performance: Fastest planning and building - Google Antigravity Slowest planning and building - OpenCode So, overall - there is no "one tool to rule them all" here - and what I found out that each tool is really good at performing specific tasks. Here is what I've learned about how to "leverage" these tools in order to build something successful: - Planning can be performed with either OpenCode of Google antigravity. Google provides free developer credits for Antigravity, and their deep-thinking and reasoning engine, when applied to software architecture and design works very well. - Backend development with either ClaudeCode or Google Antigravity. When coupled with proper topic sub-agents, these are really powerful tools. For some odd reason, Claude Code is far more capable at handling complex architectures, while Google Antigravity leans towards the "hacker style" coding. - UI/UIX development - without any question, OpenCode did a better job. It was far more capable in spitting out hundreds of lines of working UI/UX code - even faster that Claude. However, if at some point it gets stuck on a specific UI component package, it may require Claude to show it the light - so pay attention to what it's doing. - Code Review, Security and Privacy - without any question, Claude is the winner here - with potentially the most extensive availability of sub-agent topic experts.
1 like โ€ข 15d
@Darryn Campbell I completely agree with your statement - well beyond what you even imagine. About 6 months ago, a friend of mine asked me to join him in a "AI App Building Seminar" that he went to. I went with him, only to be incredibly pissed by sitting their for 90 minutes, listening to some "qwack" trying to sell me Lovable and Replit as the "solution for all my problems" - and a prompt like "Write me a CRM system that includes an accounting system" is all that it takes to launch a SaaS. Personally speaking, I think that understanding how LLMs work, not so much the transformer part, more the analysis and understanding part - that is fundamental to learn and understand, in order to write better prompts and better instructions.
0 likes โ€ข 12d
@Randy Esguerra If you need assistance - ping me.
Dograh Project and Dograh Cloud are now Cloudonix enabled
Hi All, For those of you who are not familiar with Dograh, it is an open source project aimed at providing a VAPI like functionality. Dograh is also available as a cloud offering, enabling its users to build their own AI Voice Agent experiences using their infrastructure. Dograh is based on Pipecat for its AI orchestration, coupled with your favorite AI Models and Dograh's models as well. I recommend taking a look at it - https://www.dograh.com/ Now, by default - Dograh supports Twilio, Vonage, and Vobiz. However, this week Cloudonix patches were introduced - enabling Cloudonix telephony services ontop of Dograh. So, you can use it to connect your own phone system or any SIP compatible phone provider. In about 2 weeks we'll be holding a special Dograh + Cloudonix dedicated session, where we will show how this integration works - and what features it brings. For more information, feel free to DM me and register to our upcoming office hours sessions to learn more about this integration. https://us02web.zoom.us/meeting/register/6D63tRaYSDihkJtUlpNp-A
Asterisk and LiveKit Integration
Has anyone successfully integrated LiveKit with Asterisk?
2 likes โ€ข 28d
You can do it via Stasis, however, you will need to wrap Asterisk with some stream caching and some transcoding, as the stream isn't fully compatible out of the box.
1 like โ€ข 26d
So, here is some "quirks" that I've experienced trying to do what you described. Let's start with the basic fact - it's doable. The other fact, and I'm saying this with over 20 years of experience with the Asterisk project - It was fuck'n annoying. Asterisk, when improperly configured is notorious for being "resource sparsing" - namely, it would "close sockets" or "disconnect stasis" when resources become "saturated". I need to implement an "audio socket proxy", that would implement some form of "store-and-forward" mechanism, to ensure that audio was properly being sent and in full sync. In other words, doable - but scaling it to multiple nodes and hundreds of concurrent calls - that's the challenge. As I always say: "Telephony isn't a skill, it's a vocation".
Experts Advice Needed on my Pipecat Architecture
๐—›๐—ฒ๐—ฎ๐—น๐˜๐—ต๐—ฐ๐—ฎ๐—ฟ๐—ฒ ๐—ฉ๐—ผ๐—ถ๐—ฐ๐—ฒ ๐—”๐—ด๐—ฒ๐—ป๐˜ ๐—”๐—ฟ๐—ฐ๐—ต๐—ถ๐˜๐—ฒ๐—ฐ๐˜๐˜‚๐—ฟ๐—ฒ ๐—ฅ๐—ฒ๐˜ƒ๐—ถ๐—ฒ๐˜„ Hi everyone, Running a production voice agent (~500-600 calls/day) with ๐—ฝ๐—ถ๐—ฝ๐—ฒ๐—ฐ๐—ฎ๐˜-๐—ณ๐—น๐—ผ๐˜„๐˜€. Would appreciate feedback on my architecture. ๐—ช๐—ต๐˜† ๐—ฆ๐—ฒ๐—น๐—ณ-๐—›๐—ผ๐˜€๐˜๐—ฒ๐—ฑ: Tried Pipecat Cloud but Talkdesk is not supported. WebSocket is mandatory - cannot use WebRTC. ๐—”๐—ฟ๐—ฐ๐—ต๐—ถ๐˜๐—ฒ๐—ฐ๐˜๐˜‚๐—ฟ๐—ฒ: Talkdesk โ”€โ”€WSโ”€โ”€โ–บ Bridge Server (Azure App Service) โ”€โ”€WSโ”€โ”€โ–บ Pipecat Agent (Azure VM + Docker) โ€ข Bridge converts ฮผ-law 8kHz โ†” PCM 16kHz (resampling on every chunk) โ€ข 3 Docker containers behind Nginx load balancer โ€ข Each handles ~15 concurrent calls โ”€โ”€โ–บ Each container: 3GB RAM, 0.75 CPU limit โ€ข CI/CD: GitHub Actions โ†’ Docker Hub โ†’ Azure VM pull ๐—”๐—œ ๐—ฆ๐˜๐—ฎ๐—ฐ๐—ธ: โ€ข STT: Azure Speech (Italian) โ€ข LLM: OpenAI GPT-4.1 โ€ข TTS: ElevenLabs (eleven_multilingual_v2) โ€ข VAD: Silero ๐— ๐˜‚๐—น๐˜๐—ถ-๐—”๐—ด๐—ฒ๐—ป๐˜ ๐—ฆ๐—ฒ๐˜๐˜‚๐—ฝ (pipecat-flows): Router Node โ†’ detects intent โ†’ routes to: โ€ข Booking Agent (20+ step flow) โ€ข Info Agent (RAG/knowledge base) โ€ข [Future] Person specify the doctors name e.g "I want to book appointment with Dr. Jhon for heart checkup." Doctor Booking Agent Agents can transfer between each other during conversation. ๐— ๐˜† ๐—ค๐˜‚๐—ฒ๐˜€๐˜๐—ถ๐—ผ๐—ป๐˜€: ๐Ÿญ. ๐—Ÿ๐—ฎ๐˜๐—ฒ๐—ป๐—ฐ๐˜† feels high. Is the two-hop WebSocket architecture (Talkdesk โ†’ Bridge โ†’ Pipecat) causing this? Should I merge the bridge into the Pipecat container? ๐Ÿฎ. Is having a ๐˜€๐—ฒ๐—ฝ๐—ฎ๐—ฟ๐—ฎ๐˜๐—ฒ ๐—ฏ๐—ฟ๐—ถ๐—ฑ๐—ด๐—ฒ for audio conversion a common pattern, or is there a better approach? ๐Ÿฏ. ๐—ฅ๐—ผ๐˜‚๐˜๐—ถ๐—ป๐—ด ๐—ฝ๐—ฎ๐˜๐˜๐—ฒ๐—ฟ๐—ป ๐—พ๐˜‚๐—ฒ๐˜€๐˜๐—ถ๐—ผ๐—ป: I use a Router node to detect intent and route to agents. But I'm concerned this approach is too rigid. Example: Currently I route to "Booking Agent" when user says "book X-ray". But what if user says "book with Dr. Jhon" or "book with Dr. Jhon at 3pm tomorrow"? Should I create separate agents for each variation? That feels wrong - they're all booking, just with different pre-filled data. Or should the Router extract entities (doctor name, time, service) and pass them as parameters to a single flexible agent that skips steps dynamically? What's the best pattern in pipecat-flows for handling these variations without creating rigid, bounded flows for each request type?
1 like โ€ข 29d
Which of the components is doing the audio resampling? I'm not familiar with Azure Bridge Server, so some context will be appreciated. Also, can you please provide the link to the TalkDesk WebSocket interface document? I would like to read about it a bit.
1 like โ€ข 29d
I'm reviewing the TalkDesk documentation, and I believe you may have a better option with a service like Cloudonix, replacing the Bridge Server part. I'll explain, if I'm reading the TalkDesk site correctly, Cloudonix can connect to TalkDesk via SIP TCP/TLS, and perform the resampling inside Cloudonix (its core is designed for that) - then, you can use the Cloudonix <Connect><Stream> voice application verb (compatible to Twilio's) to connect with Pipecat. If you want, catch me on Discord (https://discord.com/invite/etCGgNh9VV) to have a small testing session. I'm confident we can resolve any latency that is "network" oriented.
1-10 of 37
Nir Simionovich
4
59points to level up
@nir-simionovich-6572
I'm passionate about disrupting the communications market.

Active 1d ago
Joined Nov 7, 2025
Israel