The scariest bug isn't a crash. It's silence.
A crash is a request for attention.
Silence is the failure mode that quietly removes the signal anything is wrong, so you stop looking.
One of these is harder to fix because you don't know it's there.
We are trained to fear crashes.
Red text, stack traces, the alarm noise. They are loud and they are honest.
The system is telling you, in the most direct language it has, that something is wrong.
Silent failures don't do that. The thing keeps running. The dashboard stays green.
The metric still updates. The only thing missing is the bit where it was actually doing the job.
By the time you notice, it has usually been broken for longer than you'd like to admit.
———————————————————
Why silence is worse
A crash narrows your search. You know roughly when, you know roughly where, and the error message gives you a thread to pull. Recovery starts with a clear signal.
Silence does the opposite. There is no signal, so there is no thread. You have to first prove the bug exists before you can find it. And you only think to look when something downstream has already gone wrong, which is usually too late to catch the cause cleanly.
I shipped one of these this week. A monitor that compared a system timestamp as a string. My locale formatted the date one way, the comparison expected another. Every live process looked dead. The dashboard said "all clear" for five days while the thing it was watching wasn't being watched at all.
The bug was small. The five days was the problem.
———————————————————
Three rules I work by now
1. Surface the absence, not just the presence.
Most dashboards show what is there. Twelve active workers, four builds passing, two hundred tests green. That is useful when something is happening. It is useless when nothing is happening and that is the failure.
Put "last successful run" at the top. If that number is older than it should be, something silent has broken, even if every row underneath says "all clear."
2. Distrust anything that compares system output as a string.
Locales, time formats, sort orders, file listings. The defaults are a courtesy for humans reading output. They are a trap for code parsing it. Pin the format yourself or pin the locale to `C`.
3. Audit your own dashboards on a schedule.
Open the raw data. Run the underlying query. Confirm one count by hand. The dashboard is a representation of the truth, not the truth itself. If you only ever look at the representation, you will never catch the day it stops representing anything.
———————————————————
The pattern past code
The same shape shows up everywhere humans build trust in a system.
A morning alarm that no longer wakes you up, but the screen still says "set for 7am."
A team check-in that everyone attends, but nobody flags anything.
A retainer client who hasn't replied in a week, but the project tracker still shows "in progress."
A creative practice that still happens daily, but no longer produces work you are proud of.
In every case, the surface signal is healthy. The underlying thing is not. And the longer the gap, the more expensive the discovery.
———————————————————
The takeaway
If your only sign of health is the absence of an error, you are not measuring health. You are measuring whether the alarm itself is alive.
Build the alarm for the alarm.
//A<3
13
25 comments
Ari Evergreen
6
The scariest bug isn't a crash. It's silence.
Clief Notes
skool.com/cliefnotes
What we give away free beats most paid courses. Build durable AI systems with a Marine vet and Edinburgh researcher. 40+ lessons, growing.
Leaderboard (30-day)
Powered by