After building voice agents for 15+ clients, I turned my entire QA process into a quick LLM-powered workflow that cuts out 80–90% of the manual testing.
Testing voice agents used to drain my week.
Every small update meant another round of manual calls.
Fix a node → retest.
Update a fallback → retest.
Change a price → retest everything again.
It was brutal.
So I built a tiny internal system to automate almost all of it.
Here’s the exact flow:
- Export the full agent as a JSON file.
- Drop it into an LLM along with the client’s FAQ, KB pages, and policies.
- Paste a system prompt that forces the model to understand every branch and condition.
- Auto-generate 15–25 realistic test cases with:
– emergencies
– confused callers
– angry callers
– pricing checks
– spam filters
– function call tests
– out-of-order info
– multi-language attempts
5. Convert the whole output to Retell’s JSON format.
6. Import it into simulations.
7. Run everything at once and skim transcripts for anything weird.
Total time: ~12 minutes.
The real win isn’t just speed — it’s consistency.
Every agent gets tested the same way.
Every update gets retested with the same suite.
No more “Oh I forgot to test billing flows” moments.
If you want this level of sanity back, here's my exact SOP.