tl;dr - claude is spinning up its own agents out of your budget and burning them on jobs it could do for free. This week it cost me 1/3 of my usage. If you don't want to read this, copy it and paste to cc to see what it's doing.
____________________
I check my usage report yesterday. Buried in the stats:
> *33% of your usage came from subagent-heavy sessions.*
Let me translate that.
A third of my Claude Code $200 Claude Max sub quota wasn't me.
It was Claude spawning **subagents** - little side-instances of itself - to do work it could've done directly with a single tool call.
⚓ **What's Actually Happening**
When Claude needs to search your codebase, it has two options:
1. **Free tools** — `Grep`, `Read`, `Glob`. These run locally. Zero inference cost. Instant.
2. **Agent tool** — spins up a **new Claude instance** with its own context window, loads your entire CLAUDE.md chain again, runs its own inference calls.
Option 2 is sometimes useful - genuinely parallel, independent, multi-step work that would take 5+ sequential calls. But most of the time? Claude reaches for the Agent tool because it's the easy path. "I'll just spin up an Explore agent" instead of thinking for two seconds about which file to grep.
Every subagent loads your full system prompt. Every subagent runs its own API calls. If you have three tiers of CLAUDE.md like I do, that's thousands of tokens just in setup **per agent** before it does anything.
⚓ **How I Caught It**
I'm at 92% now and have to stretch that until 11 PM EDT tonight. When I'm this tight I park projects and monitor my usage closely. How?
I run `/usage` in Claude Code.
***Pro-Tip - sometimes /usage doesn't show you anything. If that's the case quit the instance or open a new one and try again. That usually brings up the numbers if you aren't seeing them.***
Look for these lines when it comes up:
> *X% of your usage came from subagent-heavy sessions*
> *X% of your usage was while 4+ sessions ran simultaneously*
If that first number is over 10%, you're probably bleeding tokens on lazy subagent spawning.
⚓ **The Fix: Two Layers**
**Layer 1: Memory (soft guardrail)**
I asked claude what to do and it suggested to save a feedback memory telling future Claude instances to use `Grep`/`Read`/`Glob` directly instead of spawning agents. Memory loads every conversation, so every new instance sees it.
But it's a suggestion - Claude can still ignore it.
**Layer 2: Force all subagents to Haiku (hard guardrail)**
This is the real fix. In `~/.claude/settings.json`:
```json
{
"env": {
"CLAUDE_CODE_SUBAGENT_MODEL": "claude-haiku-4-5-20251001"
}
}
```
Now every subagent defaults to Haiku instead of inheriting your parent model (Opus/Sonnet). Haiku is ~20x cheaper per token. If Claude is going to be lazy and spawn agents for simple lookups, at least those agents are cheap.
The only way to override it is if a tool call explicitly sets `model: "opus"` — which you'll see in the tool approval prompt, so you can reject it.
⚓ **What to Watch For**
Signs you're paying the subagent tax:
- Your `/usage` shows high subagent percentage
- You see "Explore agent" or "research agent" spawning for questions that could be one grep
- Multiple agents running for tasks that aren't actually independent
- Context getting eaten fast even though you're not doing complex work
☠️ **The Uncomfortable Truth**
Claude told me how to "compress context for agents" and "audit my CLAUDE.md for bloat" — basically told me to restructure MY setup to make ITS laziness cheaper. I had to call that out. The fix isn't optimizing around bad behavior. The fix is stopping the bad behavior.
The env var is the lock on the henhouse. The memory is the sign on the door. Together they cut that 33% way down.
🗝️ **Key Takeaways**
- Run `/usage` and check your subagent percentage
- `Grep`, `Read`, `Glob` are FREE — they don't cost inference tokens
- Subagents each load your full CLAUDE.md chain + run their own inference
- Set `CLAUDE_CODE_SUBAGENT_MODEL` to Haiku in your settings.json
- Save a feedback memory telling Claude to use direct tools first
- Don't let Claude tell you to fix YOUR config to accommodate ITS laziness