Arek Wu

Open Source Voice AI Community

Activity

Mon

Wed

Fri

Sun

Apr

May

Jun

Jul

Aug

Sep

Oct

Nov

Dec

Jan

Feb

Mar

What is this?

Less

Memberships

Voice AI HQ

403 members • Free

Open Source Voice AI Community

873 members • Free

5 contributions to Open Source Voice AI Community

Arek Wu

Jan 15 •

Pipecat

SOLVED: Deepgram Nova-3 (Polish) Fragmenting Phone Numbers despite `utterance_end_ms`

Hi everyone, I'm building a specialized voice assistant using **Pipecat Flows v0.0.22** and running into a frustrating issue with phone number collection that I can't seem to solve. ### The Stack - **Framework:** Pipecat Flows v0.0.22 (Python) - **STT:** Deepgram Nova-3 (Polish `pl`) - **TTS:** Cartesia (Polish voice) - **Transport:** Local WebRTC (browser-based, no telephony yet) ### The Problem When I dictate a 9-digit Polish phone number (e.g., "690807057"), the assistant receives partial fragments and processes them individually instead of waiting for the full number. For example, if I say "690... 807... 055" (with natural pauses), the bot splits it into: 1. "6" -> sent to LLM -> LLM complains "Received only 1 digit" 2. "980" -> sent to LLM -> LLM complains 3. "5" ... and so on. ### What I Have Tried I've gone through the documentation and tried several fixes, but the "defragmentation" issue persists. 1. **Deepgram Configuration (Current Setup):** I've configured the `LiveOptions` to handle phone numbers and utterance endings explicitly: ```python options = LiveOptions( model="nova-3", language="pl", smart_format=True, # Enabled numerals=True, # Enabled utterance_end_ms=1000, # Set to 1000ms to force waiting interim_results=True # Required for utterance_end_ms ) ``` *Result:* Even with `utterance_end_ms=1000`, Deepgram seems to finalize the results too early during the digit pauses. 2. **VAD Tuning:** - I tried increasing Pipecat's VAD `stop_secs` to `2.0s`. - *Result:* This caused massive latency (2s delay on every response) and didn't solve the valid STT fragmentation (Deepgram still finalized early). I've reverted to `0.5s` (and `0.2s` for barge-in) as `stop_secs=2.0s` is considered an anti-pattern for conversational flows. 3. **Prompt Engineering (Aggressive):** - I instructed the LLM to "call the function IMMEDIATELY with whatever fragments you have". - *Result:* This led to early failures where the LLM would call `capture_phone("6")`, which would fail validation (requires 9 digits), causing the bot to reject the input before the user finished speaking.

New comment Jan 15

Arek Wu

0 likes • Jan 15

@Jin Park that's a good point. We will be establishing local SIP trunk soon with one of the local providers. Can we get into a quick call? Unless you have a guidance on how to achieve that:) thanks again Jin!

Arek Wu

1 like • Jan 15

@Jin Park Fantastic! I have also forwarded linkedin invite :) Cheerio

Arek Wu

Nov '25 •

Pipecat

Pipecat+ Telnyx

Hey folks in the voice AI community, I've been grinding away on integrating Telnyx telephony with Pipecat for a custom customer request bot – an inbound voice assistant that handles real conversations(customer issues). My Pipecat playground (STT, LLM, TTS) is rock-solid locally... But Telnyx transport? It's been a battle. The WebSocket connects, VAD detects speech, but the AI just blanks on understanding me – like the 8kHz telephony signal isn't hitting STT right, causing silent or hallucinated transcripts. Tried GPT-4o-mini STT in another project and it butchered Polish language- and English; clearly need telephony-tuned STTs. I can list the key challenges I've wrestled with in Pipecat (v0.0.93) + Telnyx, plus quick notes on what I solved (spoiler: core understanding is still pending). My understanding is that Pipecat is still maturing in that domain. Anyone nailed a working Telnyx + Pipecat integration for real-time agents? How do you tune STT (Deepgram, OpenAi, AssemblyAI, or others?) to grok the Telnyx frequency without losing the plot? Would you be so kind to Share your setup or fixes? – this could turn into long-term collab gold. Cheers, Arek

New comment Nov '25

Arek Wu

0 likes • Nov '25

@Nir Simionovich Really appreciate you taking the time to break this down. Your explanation about the aggregator chain and codec sampling makes total sense. I've been researching local Polish SIP providers based on your recommendations, and I sent you a LinkedIn invite as well. :) Quick clarification question: If I find a local Polish SIP provider that natively supports OPUS or G.722 (16kHz) and offers VoIP-to-VoIP routing, would I still need Cloudonix in the chain? Or does Cloudonix add value beyond just the codec upsampling? From what I understand, the critical requirements for a local provider would be: ✅ VoIP-to-VoIP routing (bypassing PSTN) ✅ Native 16kHz codec support (G.722, OPUS, or AMR-WB) ✅ WebSocket API or SIP trunk with media streaming ✅ Polish DID numbers ✅ Solid API documentation for integration Without points 1-3, building a phone-based AI agent is basically impossible, right? I'm definitely interested in opening a Cloudonix account and learning more about what you can help configure. Just want to make sure I understand the architecture before we dive in. Thanks again for sharing your expertise! I would be really keen to participate in one of the scheduled calls.

Arek Wu

0 likes • Nov '25

Hi @Nir Simionovich , Thank you for your continued support and expertise throughout this integration process—your guidance has been invaluable. I wanted to share some positive findings from my testing with Telnyx and Deepgram Nova-2 that might be helpful for others in the community facing similar challenges. After extensive testing with my engineer, we've successfully implemented a stable setup using Deepgram Nova-2 for STT with PCMA codec at 8 kHz. The results have been excellent: - Accuracy: Deepgram Nova-2 achieves a median WER of 8.4%, with a 30% reduction in word error rate compared to competitors - Telephony optimization: Nova-2 is specifically designed to handle telephony-grade audio and performs exceptionally well with 8 kHz sampling - Codec switch: We transitioned from PCMU to PCMA with no noticeable latency increase—interaction with the bot remains seamless - Real-world validation: Testing on Polish Telnyx numbers confirms the bot understands speech accurately without requiring 16 kHz For anyone implementing Telnyx integrations and encountering difficulties, I'd recommend trying: 1. Deepgram Nova-2 for STT 2. PCMA codec in both your TeXML BIN setup and code configuration It appears that modern STT technology like Nova-2 can transport voice with full clarity to the LLM at 8 kHz, which is remarkable. For our current implementation, switching to 16 kHz isn't necessary—though we'll certainly keep it as an option for future optimization. Thanks again for being such a trusted resource in this space!😎

Pol Riba

Nov '25 •

Pipecat

Pipecat VS Livekit

I'm just curious in what platforms are you building and the pros and cons of each one.

New comment Jan 6

Arek Wu

2 likes • Nov '25

@John George Hi ! I created a mind map based on your YT video some time ago :) Check it out

Arek Wu

0 likes • Nov '25

@John George that's exactly what I do as well! Btw since you are in the business for some time now...Have you managed to successfully connect telephony system (telnyx/twilio) to you AI bot (pipecat)? I opened a new thread for that in Pipecat tab. Would you be so kind to have a look?

Nour aka Sanava

Nov '25 •

Pipecat

Special Welcome!

A special welcome to @Kwindla Kramer CEO of Daily (the team behind Pipecat)! I’m a big fan of his work and so glad to see him join this community. Make sure to follow him on LinkedIn!

New comment Dec '25

Arek Wu

3 likes • Nov '25

Hi @Kwindla Kramer - I am impressed with the work your team deliver.

Nour aka Sanava

Nov '25 •

General discussion

Welcome to the Open Source Voice AI Community!

Hey everyone, Thank you so much for your patience while we got this community ready to launch. It’s finally happening! 🎉 I’ve put together a short video explaining why I started this group and what it’s all about. I’m really excited to meet all of you — passionate, like-minded people working in the voice AI space. Our first meetup is next Friday, and it’ll be all about getting to know each other, hearing about your voice AI projects, and understanding what you’d like to learn on here. In the meantime, let’s start with introductions right here under this post 👇 Please share: - Who you are - What you’re building or working on - What you’d love to learn or explore within this community Can’t wait to see what everyone’s up to!

New comment Nov '25

Welcome to the Open Source Voice AI Community!

Arek Wu

1 like • Nov '25

Hello Nour, we both sharing similar experience. It's nice to be part of this community!

1-5 of 5

Level 2

7points to level up

Arek Wu

@arek-wu-8696

10+ years in IT industry. Optimist

Active 25d ago

Joined Nov 8, 2025

Contributions

Followers

Following