Yes, additional tokens is a cost for higher throughput quality. What I mean by this is, the structured ceremony does tax on cost for a gain on better reliability and output. It's not perfect, but in my experience better than not having altogether in the long run. A few things you can do to mitigate, which are practices I do as well: 1 - in your project/state/roadmap add a rule to never run subagents for planning. Instead, default to targeted tool calling. 2 - When claude prompts you for next steps using the [1] Apply Phase XYZ [2] Questions First [3] Pause Here << when you see this, avoid running the slash command. Instead just say 1, 2, or 3 and you can include additional context to this to further instruct claude. 3 - If you're running Opus 1mil context window, try not to exceed past 200k-230k tokens on any single session. Use handoffs and clear the session. Each back and forth resends the entire conversation, caching helps mitigate this but cache expires rather quickly within claude code so it's not always reliable. You can reduce token consumption by upwards to 70% on identical tasks / prompts just by running a handoff and clearing as opposed to continuing through the larger context window. 4 - last thing to consider, and this is something I've learned by experience. Claude conflates the line to token ratio - what it's calling out actually isn't much consumption (the number of lines, etc). If you ask it how many tokens that actually equates to, it might correct itself and say in actually it's not as much as it thought. Does this to me all the time when running my testing. Last thing I want to add is that, I've weighed the pros and cons of less consumption for saving context vs a little more ceremony and structure (at cost of more consumption) for better quality output. I've seen in my own use and experience, with proper session management, that the ceremony is most definitely worth it. The big picture actually saves more consumption long run as it gets far closer / reliable in outputting higher quality the first time around, reducing debugging, remediation, and revision on what you're creating.