🚨 A Hidden LLM Issue That Can Break Your AI SaaS
While auditing an AI-powered eCommerce system (voice-driven with LLM actions), I found a serious problem:
A single request hit 37,402 tokens — far beyond the allowed limit.
⚠️ The Problem
This wasn’t user input. It was poor system design:
Full MongoDB documents returned in tools (get_products, get_orders, etc.)
Entire conversation history sent on every request
Slightly heavy system prompts
Result: Token usage kept compounding → costs skyrocketed → system became unstable
🛠️ The Fix
Limited tool responses to essential fields + max 35 records
Enabled smarter actions (e.g., create product + quantity in one call)
Sent only the last 8 messages to the LLM for the user chatbot, and only 5 messages for Agentic Tasks
Reduced prompt size
Added clean error handling for TPM / 413 issues
📉 Outcome
Controlled token usage
Stable performance
Predictable billing
Production-ready system
💡 Takeaway
LLMs don’t become expensive on their own —
architecture makes them expensive.
If you're building AI systems, control:
context, data, and token flow.
Open to connecting with teams working on AI products facing similar challenges.