I stopped calling them "agents," and my system got more honest
A few months ago, I had an idea for a build. Thanks to 's folder architecture I succeeded. I built the thing! But then someone had a question about what I had built that turned out to matter more than I expected: are the things I built actually agents?
I'd been calling them that out of habit. Then the question forced me to be precise, and being precise changed the design. Sharing the three things that came out of it, because I was being loose with the word, so I post as a cautionary tale based on all the hype around "agents" right now.
1. "Agent" in 2026 means an autonomous reasoner — establishes its own path, it adapts, it learns. But there's an older meaning: a thing that acts — that has a scope, runs different types of operations, produces different effects.
My things are the second kind not the first. They can't learn, can't self-modify, can't pick their own goals. Once I admitted that out loud, I stopped saying "agent" unqualified and started saying bounded executor.
That not a bad thing — it was me being honest about what I had designed. Here's the interesting thing, an autonomous agent can't be audited with certainty, because what I will do next isn't predetermined. A "bounded executor" can.
I removed the autonomy feature, on purpose! If you're building "agents," ask yourself which of the two words you mean, the answer changes what you can promise about them.
2. Test for drift with identity, not observation. "Drift" changes quietly over time and is something most people try to monitor for: watch outputs, catch anomalies etc. I went the other way. Every component definition in my system is content addressed: it has a hash derived from its exact content.
So, the drift test isn't statistical, it's binary. Same hash at time A and time B and BOOM! byte identical! Different hash > there's a recorded, authorized change I can point to. There is no third case.
The idea here is to make drift unable to happen silently by fixing that thing's behavior to an inspectable definition.
3. I'm not a developer, far from it so ******DISCLAIMER INCOMING******* As far as I can tell, there is only ONE place a hallucination-class error can actually enter the system, it's the step where outside source material gets turned into structured internal content; a model assistant sitting there can invent something the source never said.
So instead of claiming immunity, I built a deterministic gate at that one door: every extracted item must cite a location that genuinely resolves in the real source, or it's rejected. The gate checks for structure (does the citation resolve) not meaning (is it the right reading) — the meaning part is designed for human-in-the-loop on purpose. "It can't hallucinate" is probably the wrong thing to say. **Hallucination has exactly one entry point and here's the gate on it** is definitely the right claim here.
Precision about what your system is makes it more trustworthy, not less impressive. When I replaced confident vague claims about agents not drifting or not hallucinating with more precise ones > I created "bounded executors", whose > drift is foreclosed by hashing, and whose > hallucination has one verified door, the system got more honest and easier to defend.
My vague claims invited the one question that breaks them. Precise claims invite inspection > and surviving inspection is the whole game when you're building something people have to trust.
Question for the room: for those building "agents" right now > autonomous reasoners, or bounded executors? And have you found the one door in your own system where the failure mode you swear can't happen actually can?
10
12 comments
Brad M
3
I stopped calling them "agents," and my system got more honest
Clief Notes
skool.com/cliefnotes
Jake Van Clief, giving you the Cliff notes on the new AI age.
Leaderboard (30-day)
Powered by