Activity
Mon
Wed
Fri
Sun
Feb
Mar
Apr
May
Jun
Jul
Aug
Sep
Oct
Nov
Dec
Jan
What is this?
Less
More

Memberships

10K Club | Sell With Usama

1k members • Free

Open Source Voice AI Community

804 members • Free

36 contributions to Open Source Voice AI Community
GeminiLive S2S + pipecat-flows Integration Issue
Hey everyone! I'm trying to integrate GeminiLive S2S (speech-to-speech) with pipecat-flows for a healthcare booking agent. The Problem: When pipecat-flows transitions between nodes, it sends LLMSetToolsFrame to update available tools. GeminiLive requires WebSocket reconnection when tools change (API limitation). After reconnection, the conversation state breaks and Gemini doesn't follow the new node's task messages to call functions. What works: - OpenAI LLM + Azure STT + ElevenLabs TTS with pipecat-flows ✅ - Tool updates happen seamlessly, no reconnection needed What doesn't work: - GeminiLive S2S + pipecat-flows ❌ - Every node transition → reconnection → broken flow Current workaround attempts: - Monkey-patched process_frame to handle LLMSetToolsFrame - Wait for session ready after reconnection - Trigger inference with new context messages - Still inconsistent behavior Questions: 1. Has anyone successfully used GeminiLive with pipecat-flows? 2. Is there a recommended pattern for handling tool updates without reconnection? 3. Should I create a custom adapter that pre-registers all tools at connection time? Any guidance appreciated! 🙏
0 likes • 4d
@John George @Nour aka Sanava @Kwindla Kramer @everyone
Best Observability Tools for Voice AI Frameworks?
What observability tools are others using with Pipecat or similar voice AI frameworks? I've built a production voice agent using Pipecat and currently track basic metrics (call duration, sentiment, summary, transcripts) in a custom dashboard. Tomorrow it's going in production so problem I think I can face is When errors will occur, debugging is painful. My current logging approach creates massive log files that are nearly impossible to analyze efficiently when tracking down issues.
1 like • Nov '25
@Johann Tagle sure… thinking of going with langfuse but will let you know where I end up
0 likes • 4d
@Muhammad Arhan yes. They support pipecat with opentelementry
New NVIDIA open model for voice agents: Nemotron Speech ASR
NVIDIA released a new open source speech-to-text model designed from the ground up for low-latency use cases like voice agents. This is part of NVIDIA's new focus on open models, which I'm excited about. These new models in the Nemotron family include STT and TTS models, specialized models like guardrail models and LLMs. And they are completely open: open weights, training code, training data sets, and inference tooling. This new STT model is very fast. Here's a voice agent running locally on my RTX 5090 with sub-500ms voice-to-voice inference. Technical write-up and link to GitHub repo: https://www.daily.co/blog/building-voice-agents-with-nvidia-open-models/ Also, Twitter and LinkedIn if either of those platforms are your thing. (I post a lot about voice agents on both platforms.) https://x.com/kwindla/status/2008601714392514722 https://www.linkedin.com/posts/kwkramer_nvidia-just-released-a-new-open-source-transcription-activity-7414368349905821696-ufuy/
1 like • 4d
Now its too much fast.. responding faster than human 🥀🥲
Musings about Vibe Coding, Pipecat, LiveKit and more
So, over the past few weeks - I've been neck deep into working with PIpecat, LiveKit and Vibe Coding. Mainly, I wanted to see what kind of milage I can get from Vibe Coding tools, and in order to test it - what's a better way than build a Pipecat/LiveKit implementation? So, I decided to examine 3 primary tools: - Claude Code - Using Sonnet 3.5 (using CLI) - OpenCode - Grok Code Fast 1 - Google Antigravity - Using Gemini 2.5 Below are my conclusions, split into several categories. 💵 Financials: Most expensive to use - Claude Code Least expensive to use - OpenCode 😡 Developer Experience: Best experience - Google Antigravity Worst experience - Claude Code 💪 Reliability: Most reliable - Claude Code Least reliable - OpenCode 🚅 Performance: Fastest planning and building - Google Antigravity Slowest planning and building - OpenCode So, overall - there is no "one tool to rule them all" here - and what I found out that each tool is really good at performing specific tasks. Here is what I've learned about how to "leverage" these tools in order to build something successful: - Planning can be performed with either OpenCode of Google antigravity. Google provides free developer credits for Antigravity, and their deep-thinking and reasoning engine, when applied to software architecture and design works very well. - Backend development with either ClaudeCode or Google Antigravity. When coupled with proper topic sub-agents, these are really powerful tools. For some odd reason, Claude Code is far more capable at handling complex architectures, while Google Antigravity leans towards the "hacker style" coding. - UI/UIX development - without any question, OpenCode did a better job. It was far more capable in spitting out hundreds of lines of working UI/UX code - even faster that Claude. However, if at some point it gets stuck on a specific UI component package, it may require Claude to show it the light - so pay attention to what it's doing. - Code Review, Security and Privacy - without any question, Claude is the winner here - with potentially the most extensive availability of sub-agent topic experts.
0 likes • 4d
@Kwindla Kramer doing the same
0 likes • 4d
Why not use oppus 4.5 of claude? And I think if we use claude max of 100usd plan is enough for me?
Small AI Voice Agents Questionnaire
Hello all, I'm trying to investigate a few hypothesis I have regarding the AI Voice Agent market. My questions are mostly related to security, observability, billing and load management. In order to do so, I've built the following Google Form: https://forms.gle/oFeM9J9WV9DRX9267 If you could please answer it, I would highly appreciate it - also, once I have all the data compiled - I will publish a post with all my findings, so that people can learn from this study as well. Much Appreciated.
0 likes • 4d
Done 👍
1-10 of 36
Mohammad Mussab
3
32points to level up
@mohammad-mussab-2383
I build Voice AI systems that handle customer calls, scheduling, and follow-ups — helping SMEs capture more revenue automatically.

Active 2d ago
Joined Nov 10, 2025
Pakistan