I am finding significant latency on agents like 4o-mini when it has to answer a question and then generate structured JSON output. Its similar to nates examples of a project manager, it answer the users question like "whats the current budget" but then providing the project details for a UI like team members, stakeholders, recent updates, timeline. Using the tool agent to do both it hallucinated a lot, I create a structured output parser with another agent to create the JSON, that made it consistent but the latency went up from 4 seconds to 25.
I did try some other smaller agents like grok and llama but those didn't even follow the prompt and were very inconsistent.
I'm now doing two flows, one to get the quick answer the voice agent can read, and a 2nd call that the UI will use to get project data.. as I want a generative interface that display the "Details" of the user query on screen, and the voice agent summarizes..
If anyone has dealt with the latency of multiple LLM's and found ways to speed it up I'd be interested.