Hi all - I'm building out my first voice agent, and having nearly 30 years of traditional (deterministic) dev behind me, my spidey senses are struggling with the probabilistic nature of it all. It seems like my system prompt looks solid, and then I'll test a conversation and the agent will get something wrong, whereas it got that right the 14 previous times. Sometimes it seems the LLM just decides to ignore a specific instruction that it didn't ignore previously. It feels like I'm going to be going through hundreds of system prompt iterations before I'll be confident it's not going to mess something up? Is there a standard way you guys deal with this? Or is going though this pain just something you all do every time?