Activity
Mon
Wed
Fri
Sun
Feb
Mar
Apr
May
Jun
Jul
Aug
Sep
Oct
Nov
Dec
Jan
What is this?
Less
More

Memberships

Voice AI HQ

355 members • Free

Open Source Voice AI Community

804 members • Free

1 contribution to Voice AI HQ
Help Needed: Deepgram Nova-3 (Polish) Fragmenting Phone Numbers despite `utterance_end_ms`
Hi everyone, I'm building a specialized voice assistant using **Pipecat Flows v0.0.22** and running into a frustrating issue with phone number collection that I can't seem to solve. ### The Stack - **Framework:** Pipecat Flows v0.0.22 (Python) - **STT:** Deepgram Nova-3 (Polish `pl`) - **TTS:** Cartesia (Polish voice) - **Transport:** Local WebRTC (browser-based, no telephony yet) ### The Problem When I dictate a 9-digit Polish phone number (e.g., "690807057"), the assistant receives partial fragments and processes them individually instead of waiting for the full number. For example, if I say "690... 807... 057" (with natural pauses), the bot splits it into: 1. "6" -> sent to LLM -> LLM complains "Received only 1 digit" 2. "980" -> sent to LLM -> LLM complains 3. "5" ... and so on. ### What I Have Tried I've gone through the documentation and tried several fixes, but the "defragmentation" issue persists. 1. **Deepgram Configuration (Current Setup):** I've configured the `LiveOptions` to handle phone numbers and utterance endings explicitly: ```python options = LiveOptions( model="nova-3", language="pl", smart_format=True, # Enabled numerals=True, # Enabled utterance_end_ms=1000, # Set to 1000ms to force waiting interim_results=True # Required for utterance_end_ms ) ``` *Result:* Even with `utterance_end_ms=1000`, Deepgram seems to finalize the results too early during the digit pauses. 2. **VAD Tuning:** - I tried increasing Pipecat's VAD `stop_secs` to `2.0s`. - *Result:* This caused massive latency (2s delay on every response) and didn't solve the valid STT fragmentation (Deepgram still finalized early). I've reverted to `0.5s` (and `0.2s` for barge-in) as `stop_secs=2.0s` is considered an anti-pattern for conversational flows. 3. **Prompt Engineering (Aggressive):** - I instructed the LLM to "call the function IMMEDIATELY with whatever fragments you have". - *Result:* This led to early failures where the LLM would call `capture_phone("6")`, which would fail validation (requires 9 digits), causing the bot to reject the input before the user finished speaking.
1 like • 4d
@Karol Meguid Really helpful. Thanks again. Quick question - I use Cartesia for TTS - is there a better, more optimal one for the Polish language? Based on your testing: ) I get some nuances and anomalies with Cartesia. Thanks
1 like • 4d
@Karol Meguid Do you have any tweaks if it messes up the pronunciation of some words? Or you just insert it in the prompt that instructs the assistant to pay attention to how it is pronounced?
1-1 of 1
Arek Wu
2
14points to level up
@arek-wu-8696
10+ years in IT industry. Optimist

Active 4d ago
Joined Nov 21, 2025