Andrew Cherry

I’ve spent the last few months building an Agentic Voice AI platform. Like everyone else, I started with the "standard" stack: Vapi (Orchestration) + Deepgram (Ear) + GPT-4 (Brain) + 11Labs (Mouth). It works, but the margins were killing me. I just re-architected the entire backend using the new Google "Antigravity" stack (Vertex AI + Gemini 2.0 Flash Live), and the math is actually insane. 1. The Economics (82% Savings) The "Middleware Tax" is real. By stripping out the orchestrator and going native, the unit economics completely flipped. To put it in real dollars: - Old Stack: For 4,000 minutes of conversation (~1,000 calls), I was paying $1,020/mo ($0.26/min). - New Stack: For the exact same volume, my bill dropped to $180/mo ($0.045/min). That is nearly $10k a year in pure profit recovered just by switching infrastructure on a relatively small volume. 2. The "Real" Agentic Shift (Voice + Browser) The biggest unlock isn't even the cost; it's the "Action Layer." Instead of just a chatbot that takes notes, I’m running Headless Browsers (Playwright) inside Google Cloud Run containers. - The Agent (Gemini 3 Pro) doesn't just "speak" via API. - It autonomously logs into legacy SaaS portals (which have no APIs), clicks buttons, and executes complex workflows while talking to the user in real-time (<500ms latency). 3. The Architecture - Voice: Python/FastAPI on Cloud Run (handling WebSockets). - Model: Gemini 2.0 Flash Live (Handling the interruptions/turn-taking natively—no Vapi needed). - Memory: Context Caching (Upload 50-page SOPs once; pay near-zero to query them on every call). - Scale: Terraform scripts spin up isolated, secure environments for every new client in 5 minutes. If you are still chaining together 4 different APIs to make a voice bot, you might be over-paying for a "wrapper" when you could own the infrastructure. Happy to share more on the Terraform setup if anyone is interested. This feels like the moment the industry shifts from "Chatbots" to true "Action Engines."

New comment 12d ago

Andrew Cherry

16d •

General discussion 💬

Google AntiGravity!!!!! Holy Shit the game has changed!!!!!

Just been talking to Gemini 3 about Antigravity and if it could replace my current stack and be more cost effective!!!! Q: So, can you orchestrate? Will you be able to? Can it be an orchestration and logic layer like VAPI and Retell? This is the million-dollar question. The short answer is: Google Antigravity is not the layer itself (like Vapi); it is the factory where you build the layer. However, what you build inside it (using the new ADK and Gemini 3) creates a "Native Orchestration" that actually removes the need for Vapi or Retell entirely. Here is the technical breakdown of how Antigravity + Gemini 3 disrupts the Vapi/Retell model: 1. The Architecture Shift: "Stitched" vs. "Native" This is the most important distinction. - How Vapi / Retell work (The "Stitched" Stack): - How Antigravity works (The "Native" Stack): 2. Can it handle Logic & Interruption? (The ADK) Vapi and Retell are famous for their "endpointing" (knowing when to stop talking if the user interrupts). Can Google do this? Yes. This is where the Agent Development Kit (ADK) comes in. - The Logic Layer: Antigravity includes the ADK, a library that handles the state machine of the conversation. - Interruption: Because Gemini 3 supports "Bidirectional Streaming," it handles interruptions natively. If the user speaks while the AI is talking, the model's audio input stream detects it and halts the audio output stream immediately—logic you historically paid Vapi to handle for you. 3. The Trade-Off: "Buy" vs. "Build" FeatureVapi / RetellGoogle Antigravity + Vertex AIOrchestrationPre-packaged. You get a dashboard where you toggle "Interruption Sensitivity."Code-First. You write the logic in Python/Node using the ADK (e.g., agent.on_interrupt(stop_speaking)).TelephonyBuilt-in. They give you a phone number and handle the SIP trunking (via Twilio/Vonage).Bring Your Own. You likely need to connect your own telephony provider (Twilio) to the Google agent, though Google Cloud CCAI handles this for enterprise.CostPer-minute markup. You pay Vapi + LLM + Transcriber + Synthesizer.Token-based. You pay for Gemini 3 tokens (Audio In/Out). Generally cheaper at scale.LatencyLow, but limited by network hops.Ultra-Low. Single model inference.

Andrew Cherry

23d •

General discussion 💬

Google Just Made RAG Invisible

Hey Guys! If you are building local voice AI agents today, you are probably juggling RAG stacks, vector databases, embeddings and a messy collection of tools just to let your agents understand a knowledge base. Google’s new Gemini File Search essentially wipes out that entire layer of complexity. You upload your documents once and Google handles chunking, embeddings, indexing, retrieval, ranking and citations behind the scenes. From your side, it becomes a shift from building an entire RAG pipeline to making a single API call and directing your voice agent to a file store. What this really means is that the competitive advantage is no longer found in how you implement RAG. Google just turned the technical plumbing into a commodity. The real leverage now moves to what you connect that intelligence to, the workflows you design, the insights you extract and the proprietary data you shape into usable knowledge. For voice AI builders, this is a major shift. You can stop pouring time into infrastructure and instead focus on creating agents that solve real operational problems, that understand policy and process without hallucination, and that integrate tightly with the actual day to day work your clients need done. This is the moment where domain mastery beats technical tinkering.

New comment 22d ago

Andrew Cherry

0 likes • 22d

@Rob Olson Obsolete for people who only knew how to wire Pinecone. Jet fuel for the people actually shipping and the folks yelling ‘obsolete’ were doing RAG like a group project. The rest just got a faster toolkit ..clearing out the junior-level tasks really helps highlight the people doing real system design however even though the scaffolding disappears the builders will keep building. Love it!

Brendan Jowett

Oct 30 •

Announcements

Which Voice AI Platform?

Hey everyone, Just wanted to run a quick poll for our Voice AI Testing software Relyable. For those who are not using Vapi or Retell, which platform do you use? We will use this to gauge which platform to add support for next. Thank you! Brendan

Poll

119 members have voted

New comment Nov 4

Andrew Cherry

0 likes • Nov 4

Sim.ai

1-4 of 4

Level 2

3points to level up

Andrew Cherry

@andrew-cherry-8170

🏠 Real Estate Investor 💡 Digital Marketing Strategist 🤖 AI Marketing Consultant Transforming investments into opportunities.

Active 4d ago

Joined Oct 23, 2025

Contributions

Followers

Following