Our 94% Accurate Model Was Quietly Losing £50K a Week

It was a Tuesday morning. I was scrolling through our monitoring dashboard—the usual routine, checking if anything had exploded overnight. Everything looked fine. Accuracy was at 94%, right where it should be. The model had been live for 6 months. Stable. Predictable.

But something in my gut said, "Wait, this is too quiet."

So I did something I don't usually do at 7 AM. I pulled the data deeper. I looked past the accuracy number.

What I found: our model was making decisions with absolute confidence on cases where it had no business being confident. The data it was trained on came from 2024. It was now mid-2026. The world had changed, but the model hadn't noticed.

For a fintech company using this, that translated to about £50K per week being misclassified. Silent losses. You'd never see it in the accuracy metric.

Here's what kills me about this:

That company could have gone 2–3 more months without knowing. Accuracy would stay at 94%. Everything looks good. And by the time they figure it out? They've lost £400K. Maybe more.

We caught it in week 3 because we built something different. We weren't just watching accuracy. We were watching uncertainty.

The model's confidence was dropping while accuracy stayed flat. That's the mismatch nobody talks about. That's the signal.

Here's what we learned—three principles that actually work:

1. Uncertainty matters more than accuracy. Your model can have 97% accuracy and be completely unreliable for decision-making. These are different questions. Most tools only measure one.

2. Disagreement is data. We ran an ensemble of models. When they disagreed on predictions—especially when they disagreed while the primary model expressed high confidence—that's where the real problems hide. 12% ensemble disagreement for us meant an 87% probability of actual drift in the data.

3. Calibration is non-negotiable. Don't use someone else's thresholds. Calibrate your system to YOUR data, YOUR tolerance for error, and YOUR business risk. This is what separates companies that just have AI from companies that trust their AI.

This took us 3 months to build properly. But it paid for itself 100 times over. For some enterprises, it prevented £1M+ in losses.

Here's the conversation I want to have:

If you're running ML in production right now—and I mean right now—your model is probably degrading.

You might not know it. Your monitoring platform is probably not telling you. Because most monitoring platforms were built to check if the model is still accurate, not if the model still understands what it doesn't know.

That's the difference between a model that works and a model that's trustworthy.

For the practitioners: What's your actual monitoring setup? How would you even know if your model drifted before your business felt it?

For the teams building this: What would it take to add epistemic verification to your pipeline?

For the decision-makers: If your ML system degraded silently tomorrow, would you know? Not in a week. Not in a month. Tomorrow?

0 comments