I’m gearing up to build a modular voice AI agent that handles both inbound and outbound calls, fully automated, context-aware, and scalable.
The stack I'm gonna use are:
- Twilio for call routing and telephony
- ElevenLabs for expressive voice synthesis
- Airtable for logging, CRM-style tracking, and prompt injection
- n8n for my backend
The architecture is mapped. The use cases are solid. But here’s the blocker: tooling costs. These platforms aren’t free, and I’m not in a position to invest just yet. Well, except for n8n since I'm self hosting.