I stopped running everything on Opus. Here's the system I built instead.
Running Claude Code on Opus is powerful. But most turns in a session are mechanical, reading files, writing boilerplate, simple edits. You're paying top-tier rates for work that Haiku could handle in its sleep. Anthropic just released the Advisor Tool: A server-side pattern where a cheap executor model consults Opus mid-generation for strategic guidance. But it's API-only. Can't use it inside Claude Code. So I rebuilt the pattern inside Claude Code using what already exists. The setup: Opus stays in the orchestrator seat. It plans. It makes architecture decisions. It reviews output. It never touches files directly unless the task genuinely needs Opus-level judgment. Everything else gets dispatched to cheaper models: - Agent({ model: "haiku" }) — Claude subagents with full file access for simple edits - Agent({ model: "sonnet" }) — for multi-file changes that need moderate reasoning - A CLI tool (ask.py) that routes to Gemini, Kimi, MiniMax, or local Gemma via Ollama — for code generation, research, video analysis, anything where you just need text back The routing logic: Does the task need file access? → Agent tool (haiku/sonnet) Is the code complex? → sonnet or gemini Is the code simple? → haiku, kimi, or minimax (cheaper) Need video analysis? → gemini --video (native, no frame extraction) Need parallel research? → kimi-swarm (spawns up to 100 sub-agents) Want zero API cost? → gemma running locally via Ollama When a subagent hits something it can't handle, it reports back NEEDS_GUIDANCE. Opus thinks it through and re-dispatches with better context. That's the advisor pattern, strategic guidance exactly when needed, cheap execution everywhere else. Cost impact: A 10-task session where 8 tasks go to Haiku and 2 to Sonnet — your Opus tokens are only spent on planning and review. Maybe 20% of the total token volume. The other 80% runs at Haiku rates. With local models mixed in, it drops further. Open-sourced the whole thing: