You may be burning AI tokens without realizing it.
In Claude Code, **prompt caching** can reduce the cost of reading repeated context by up to 90%.
But here is the critical detail:
Cache does not last forever.
It can have a TTL of 5 minutes or 1 hour.
If you keep the session active, the cache stays warm.
If you stop for too long, the cache gets cold.
The next call may need to read everything again.
Cache stays alive when you keep:
* same model
* same effort level
* same tools
* same `CLAUDE.md`
* same project
* stable context
Cache breaks when you:
* switch models
* change effort
* add or remove tools/MCPs
* edit `CLAUDE.md`
* change the beginning of the prompt
* let the session go cold past the TTL
The rule is simple:
Cheap AI is not just about using a cheaper model.
It is about stable context and warm cache.
======
**1.You are paying AI to reread what it already read.
**2.Claude’s cache can expire in 5 minutes.
**3.One wrong pause can increase your token cost.
**4.Switching models can break your cache.
**5.Prompt caching can cut repeated context cost by up to 90%.
**6.The secret is not just better prompts. It is warm cache.
**7.Professional Claude Code starts with stable context.
**8.Change the beginning of the prompt, lose the cache.
**9.Cold cache = expensive AI. Warm cache = efficient AI.
**10.The invisible mistake making your AI bill higher.