Built a Full AI Pipeline on One Laptop — Voice Is Next
Hey everyone — been building local-first AI infrastructure and this community is exactly my vibe. I run a full AI pipeline from a single laptop (RTX 5080 16GB, 32GB DDR5) — Ollama in Docker with GPU passthrough, PostgreSQL, Redis. The philosophy: 80% of AI workload runs on free local models, only the 20% that needs frontier reasoning hits a cloud API. Cost per pipeline run dropped from $8-15 to $0.15-0.40. I've shipped a few tools with this setup — market scanners, a knowledge retention engine with RAG, and a live SaaS API product. All from the same machine. What brought me here: I want to add a voice layer. Seeing folks run Pipecat with local STT/TTS on consumer GPUs is exactly the direction I'm heading. My Ollama stack already handles LLM inference — pairing that with local Whisper or the new NVIDIA Nemotron STT model on the same GPU seems like the natural next step. A few things from the recent threads caught my eye: - @Kwindla's sub-500ms voice-to-voice on an RTX 5090 with Nemotron — curious how that scales down to a 5080 with 16GB VRAM when the LLM is also loaded - @Jin Park's custom orchestration engine replacing Vapi/Retell — that modular approach maps directly to how I route pipeline stages between local and cloud models - The latency discussion around local vs cloud STT — has anyone benchmarked Whisper locally against Deepgram for voice agent round-trip times? Looking forward to learning from this group and sharing what I build.