Recently, I built an **AI agent** that collects necessary information from users (via **audio, video, or text**) and then **triggers an email to the manager** requesting a quotation. The idea was to streamline client interactions and make the communication seamless.
π¬ Client Request: WhatsApp Integration
The client wanted the AI agent to also work over **WhatsApp**, so that users could chat directly. I integrated this using **Twilio Webhooks** β incoming WhatsApp messages were routed to the backend, processed, and then the agent responded.
β‘ The Problem
During testing, I noticed a major issue:
- If a user sent multiple messages quickly (e.g., multiple texts or images), the webhook would forward them individually.
- Since my backend was processing messages one at a time (effectively single-threaded per user), **any new messages arriving while processing was still happening were ignored**.
- This broke the experience β especially when users uploaded several images or fired off multiple texts in a row.
π§ The Fix: Redis Message Buffering
To solve this, I added **Redis as a temporary message queue**:
- Every incoming message was stored in Redis.
- Instead of processing immediately, the system would **wait 5 seconds**.
- During that window, if multiple messages came in (text + images, etc.), they were grouped together.
- After 5 seconds, **all messages were processed at once** and sent as a single batch to the backend.
β
The Result
- Users could now send multiple messages naturally without anything being dropped.
- The agent received a **complete context** (all related messages/images together).
- **Cost efficiency improved** since instead of firing multiple API calls for each message, one batched call was made per user session.
- The whole interaction flow became **smoother and more reliable**.
π This small change made a big difference in both **user experience** and **system efficiency**.
Itβs a reminder that sometimes the best optimizations come from watching how **real users interact** and designing around those patterns.
π Curious to hear your thoughts: Was this the right approach, or could there have been a better solution?