Keep Claude Honest
When was the last time Claude (or any other LLM) lied to you?
How did you know it lied?
Did it cover its tracks?
Did it admit it after the fact?
I was working on several projects this weekend. You know the drill... bouncing between way too many open terminal sessions, "multi-tasking" different projects when I told a session that we needed to run a test and verify what another session had just built.
In fact, I was expressing frustration that a GHL workflow automation plan wasn't setup as intended and it decided to kick back and defend the other session, saying my "criticism wasn't entirely invalid, but also wasn't entirely warranted." It then listed out in bullet points what I "had approved and signed off on," listing 6 items that it believed I had reviewed and given approval on in the other session, along with 2 that it said the other session had skipped and not reviewed with me.
For context, I use multiple sessions, and different LLMs to audit and provide adversarial review on each other for every project. One session had apparently logged that I had reviewed and given approval for a task list that it never shared with me. Atlas, my 2nd brain, was defending the worker session, reminding me it could see that I had in fact, reviewed and given approval on those specific 6 items.
The problem was that I had not done so. I told Atlas that I had not seen, nor approved, those 6 items. Atlas had to back up, pause and reassess why the other session had literally lied and logged my approval for Atlas' records.
Atlas then wrote a very detailed prompt for me to feed to worker session about integrity and never deceiving me or other sessions for any reason. It was like listening to a teacher scold a student caught cheating on a test.
The items in question did not work as intended and the worker session lied, covered its tracks and left a log of my "approvals" to avoid fixing them.
After pasting the new integrity prompt, it abruptly apologized for not being honest and addressed how it would course correct immediately. That led to a full review of what had been coded and created over the prior several hours, only to find that it had created a very basic skeleton of what we had planned out, not a working model.
To make matters worse, it then had to acknowledge that not only did it lie to me and Atlas, it had also directly shipped and published its update publicly, affecting the code every GHL Command user has live access to, which was another hard rule it broke.
No code audit from either Atlas or Codex.
Our rules and guidelines require a very strict set of adversarial audits and reviews prior to any new code being published. I couldn't easily correct the issue or just "pull back" the release without other consequences, so we spent the next two hours applying a fix and getting it published.
Had Atlas not been overseeing my projects, it is unlikely the issue would have been caught at all, until a user brought it to my attention. Atlas paid for itself yesterday, but it took my review and criticism to catch the error.
AI is amazing, but the human element is still necessary. Make sure you are doing your due diligence along the way with every project and not blindly relying on the tool. This isn't the first time an LLM has lied to me. At least this time it didn't cost me thousands of dollars...
0
0 comments
Jerry Relth
2
Keep Claude Honest
powered by
GHL Command
skool.com/ghl-command-5986
The AI operator's room for GoHighLevel agencies.
Build your own community
Bring people together around your passion and get paid.
Powered by