Help Needed: Deepgram Nova-3 (Polish) Fragmenting Phone Numbers despite `utterance_end_ms`
Hi everyone, I'm building a specialized voice assistant using **Pipecat Flows v0.0.22** and running into a frustrating issue with phone number collection that I can't seem to solve. ### The Stack - **Framework:** Pipecat Flows v0.0.22 (Python) - **STT:** Deepgram Nova-3 (Polish `pl`) - **TTS:** Cartesia (Polish voice) - **Transport:** Local WebRTC (browser-based, no telephony yet) ### The Problem When I dictate a 9-digit Polish phone number (e.g., "690807057"), the assistant receives partial fragments and processes them individually instead of waiting for the full number. For example, if I say "690... 807... 057" (with natural pauses), the bot splits it into: 1. "6" -> sent to LLM -> LLM complains "Received only 1 digit" 2. "980" -> sent to LLM -> LLM complains 3. "5" ... and so on. ### What I Have Tried I've gone through the documentation and tried several fixes, but the "defragmentation" issue persists. 1. **Deepgram Configuration (Current Setup):** I've configured the `LiveOptions` to handle phone numbers and utterance endings explicitly: ```python options = LiveOptions( model="nova-3", language="pl", smart_format=True, # Enabled numerals=True, # Enabled utterance_end_ms=1000, # Set to 1000ms to force waiting interim_results=True # Required for utterance_end_ms ) ``` *Result:* Even with `utterance_end_ms=1000`, Deepgram seems to finalize the results too early during the digit pauses. 2. **VAD Tuning:** - I tried increasing Pipecat's VAD `stop_secs` to `2.0s`. - *Result:* This caused massive latency (2s delay on every response) and didn't solve the valid STT fragmentation (Deepgram still finalized early). I've reverted to `0.5s` (and `0.2s` for barge-in) as `stop_secs=2.0s` is considered an anti-pattern for conversational flows. 3. **Prompt Engineering (Aggressive):** - I instructed the LLM to "call the function IMMEDIATELY with whatever fragments you have". - *Result:* This led to early failures where the LLM would call `capture_phone("6")`, which would fail validation (requires 9 digits), causing the bot to reject the input before the user finished speaking.