Jin Park

Open Source Voice AI Community

Activity

Mon

Wed

Fri

Sun

Apr

May

Jun

Jul

Aug

Sep

Oct

Nov

Dec

Jan

Feb

Mar

What is this?

Less

Owned by Jin

The AI Voice Agent Hub

2 members • Free

Memberships

Voice AI Accelerator

7.6k members • Free

Open Source Voice AI Community

873 members • Free

The Confident Edge

37 members • Free

AI Automation Agency Hub

299.6k members • Free

8 contributions to Open Source Voice AI Community

Jin Park

Jan 21 •

LiveKit

I cooked up a raw Voice AI orchestration engine from scratch using 𝗟𝗶𝘃𝗲𝗞𝗶𝘁 & 𝗣𝘆𝘁𝗵𝗼𝗻. 🍳

While wrappers are great for MVPs, building your own orchestration layer gives you 𝗳𝘂𝗹𝗹 𝗼𝘄𝗻𝗲𝗿𝘀𝗵𝗶𝗽, 𝘀𝗶𝗴𝗻𝗶𝗳𝗶𝗰𝗮𝗻𝘁𝗹𝘆 𝗹𝗼𝘄𝗲𝗿 𝗰𝗼𝘀𝘁𝘀, 𝗮𝗻𝗱 𝗴𝗿𝗮𝗻𝘂𝗹𝗮𝗿 𝗰𝗼𝗻𝘁𝗿𝗼𝗹 over the entire conversational pipeline. I designed this engine to fully replace third-party wrappers like Vapi & Retell AI. Here is a deep dive into what’s under the hood: 🔄 𝗗𝘆𝗻𝗮𝗺𝗶𝗰 𝗔𝗴𝗲𝗻𝘁 𝗖𝗼𝗻𝗳𝗶𝗴𝘂𝗿𝗮𝘁𝗶𝗼𝗻 (𝗥𝗲𝗮𝗹-𝗧𝗶𝗺𝗲 𝗛𝘆𝗱𝗿𝗮𝘁𝗶𝗼𝗻) Hardcoding agents is a trap. I implemented a system that executes an API call upon call initialization. • 𝗛𝗼𝘁-𝗦𝘄𝗮𝗽𝗽𝗮𝗯𝗹𝗲 𝗣𝗲𝗿𝘀𝗼𝗻𝗮𝘀: A single engine instance can instantly apply unique System Prompts, Voice IDs, and Temperature settings based on backend parameters. • 𝗥𝗲𝘀𝘂𝗹𝘁: You can power thousands of unique agents (e.g., specific to different businesses) without ever redeploying the core code or creating a new instance. 🛠️ 𝗖𝗼𝗻𝘁𝗲𝘅𝘁-𝗔𝘄𝗮𝗿𝗲 𝗙𝘂𝗻𝗰𝘁𝗶𝗼𝗻 𝗥𝗼𝘂𝘁𝗲𝗿 When building raw infrastructure, manually mapping tools to agents is a major architectural hassle. I built specialized helper logic for 𝗗𝘆𝗻𝗮𝗺𝗶𝗰 𝗧𝗼𝗼𝗹 𝗜𝗻𝗷𝗲𝗰𝘁𝗶𝗼𝗻 to solve this. • 𝗠𝗼𝗱𝘂𝗹𝗮𝗿 𝗟𝗼𝗴𝗶𝗰: The router decouples the orchestration engine from business logic. It parses the backend setup and assigns only the specific tools defined in that agent's configuration (e.g., loading "Appointment Booking" tools only when the specific use-case demands it). 💾 𝗗𝗮𝘁𝗮 𝗣𝗲𝗿𝘀𝗶𝘀𝘁𝗲𝗻𝗰𝗲 & 𝗣𝗼𝘀𝘁-𝗖𝗮𝗹𝗹 𝗜𝗻𝘁𝗲𝗹𝗹𝗶𝗴𝗲𝗻𝗰𝗲 Logs aren't enough. I built a save_conversation function that aggregates the full session payload and triggers intelligent sub-functions immediately after the call: • 𝗖𝗮𝗹𝗹 𝗦𝘂𝗺𝗺𝗮𝗿𝘆: Generates a natural language recap via LLM. • 𝗖𝗮𝗹𝗹 𝗘𝘃𝗮𝗹𝘂𝗮𝘁𝗶𝗼𝗻: Structurally classifies the outcome (e.g., "Booked", "Inquiry", "Failed"). • 𝗧𝗲𝗹𝗲𝗺𝗲𝘁𝗿𝘆: Captures precise Token Usage (for billing) and Latency statistics alongside the transcript. 🛡️ 𝗣𝗿𝗼𝗱𝘂𝗰𝘁𝗶𝗼𝗻 𝗚𝘂𝗮𝗿𝗱𝗿𝗮𝗶𝗹𝘀 To prevent runaway costs and "zombie" connections, I engineered active background monitors: • 𝗜𝗻𝗮𝗰𝘁𝗶𝘃𝗶𝘁𝘆 𝗠𝗼𝗻𝗶𝘁𝗼𝗿: Detects silence (30s default) and gracefully terminates the session.

New comment Feb 4

Jin Park

3 likes • Jan 22

@Darryn Campbell I've been building with LiveKit about two years. My experience with LiveKit has been very positive!

Jin Park

1 like • Feb 4

@Nir Simionovich Thanks, Nir. Let's have a chat soon.

Arek Wu

Jan 15 •

Pipecat

SOLVED: Deepgram Nova-3 (Polish) Fragmenting Phone Numbers despite `utterance_end_ms`

Hi everyone, I'm building a specialized voice assistant using **Pipecat Flows v0.0.22** and running into a frustrating issue with phone number collection that I can't seem to solve. ### The Stack - **Framework:** Pipecat Flows v0.0.22 (Python) - **STT:** Deepgram Nova-3 (Polish `pl`) - **TTS:** Cartesia (Polish voice) - **Transport:** Local WebRTC (browser-based, no telephony yet) ### The Problem When I dictate a 9-digit Polish phone number (e.g., "690807057"), the assistant receives partial fragments and processes them individually instead of waiting for the full number. For example, if I say "690... 807... 055" (with natural pauses), the bot splits it into: 1. "6" -> sent to LLM -> LLM complains "Received only 1 digit" 2. "980" -> sent to LLM -> LLM complains 3. "5" ... and so on. ### What I Have Tried I've gone through the documentation and tried several fixes, but the "defragmentation" issue persists. 1. **Deepgram Configuration (Current Setup):** I've configured the `LiveOptions` to handle phone numbers and utterance endings explicitly: ```python options = LiveOptions( model="nova-3", language="pl", smart_format=True, # Enabled numerals=True, # Enabled utterance_end_ms=1000, # Set to 1000ms to force waiting interim_results=True # Required for utterance_end_ms ) ``` *Result:* Even with `utterance_end_ms=1000`, Deepgram seems to finalize the results too early during the digit pauses. 2. **VAD Tuning:** - I tried increasing Pipecat's VAD `stop_secs` to `2.0s`. - *Result:* This caused massive latency (2s delay on every response) and didn't solve the valid STT fragmentation (Deepgram still finalized early). I've reverted to `0.5s` (and `0.2s` for barge-in) as `stop_secs=2.0s` is considered an anti-pattern for conversational flows. 3. **Prompt Engineering (Aggressive):** - I instructed the LLM to "call the function IMMEDIATELY with whatever fragments you have". - *Result:* This led to early failures where the LLM would call `capture_phone("6")`, which would fail validation (requires 9 digits), causing the bot to reject the input before the user finished speaking.

New comment Jan 15

Jin Park

1 like • Jan 15

@Arek Wu that’s good to hear! But you should fix this at the system level. There should be an option to get the caller’s number through SIP trunk and store it in the session meta data. This way you don’t ever have to worry about getting user’s phone number manually and poor accuracy.

Jin Park

1 like • Jan 15

@Arek Wu I sent you the link to my calendar via DM. Feel free to schedule a call.

Yar Malik

Nov '25 •

LiveKit

Who has built extremely scalable Voice AI System with LIvekit & Pipecat

I mean a system where one can do 10k calls per day. Has anyone built a system like this using livekit and pipecat. did you do it without using our own GPUs ?

New comment Dec '25

Jin Park

1 like • Nov '25

I got one

Ahmet Bayezıd Okumuş

Nov '25 •

General discussion

I just joined the group and I have a question?

Hello, I'm ahmet. I just joined the community. I had a few experiences for open source ai voice, but I didn't get the result I wanted. First of all, I turned to open source tts - stt models. They are not enough for Turkish at the moment (I serve in Turkey). I guess I will have to train myself. Some of the big companies I talked to have attempted this, but they haven't gotten much results yet. Do you have any advice? Sincerely Thank you.

New comment Nov '25

Jin Park

1 like • Nov '25

Hi Ahmet, let’s connect. I may be able to help.

Nour aka Sanava

Nov '25 •

General discussion

Welcome to the Open Source Voice AI Community!

Hey everyone, Thank you so much for your patience while we got this community ready to launch. It’s finally happening! 🎉 I’ve put together a short video explaining why I started this group and what it’s all about. I’m really excited to meet all of you — passionate, like-minded people working in the voice AI space. Our first meetup is next Friday, and it’ll be all about getting to know each other, hearing about your voice AI projects, and understanding what you’d like to learn on here. In the meantime, let’s start with introductions right here under this post 👇 Please share: - Who you are - What you’re building or working on - What you’d love to learn or explore within this community Can’t wait to see what everyone’s up to!

New comment Nov '25