We've been experimenting with treating ICM not as the whole system, but as one layer inside a larger orchestration architecture. For us, ICM solved something much bigger than prompting. It solved context. How do you keep models focused? How do you stop them from reading entire repositories? How do you bound work? How do you reduce drift? How do you move toward convergence? ICM gives us work packets, context contracts, routing, validation, and controlled handoffs. Once we started implementing it, we found ourselves asking: What happens if we build around that? Internally we've been experimenting with a governance layer we call AQ-CMF (just our internal name for it), but I think the more interesting thing to share is the orchestration itself. Right now it's basically a small "Swarm Orchestration Starter Pack." The idea is simple: Use the smallest model capable of doing the work. Reserve larger models for judgment and reasoning. Current setup: RTX 3060 12GB • 2,200 binary filtering workers • Qwen 0.6B • yes/no decisions • triage • filtering • classification RTX 5060 Ti 16GB • 150 structured extraction workers • Qwen 4B • schema completion • information extraction • template generation Cloud reasoning layer (introduced to me by @Ari Evergreen 's post https://www.skool.com/cliefnotes/i-run-100-agent-workflows-on-a-budget-model-heres-the-catch) • up to 200 Kimi 70B workers • interpretation • reasoning • code generation • higher-complexity analysis Claude Code • orchestration • synthesis • validation • architecture decisions • final judgment The smaller models don't really "think." They observe. They classify. They extract. They filter. Claude assembles. Claude validates. Claude decides. One thing we've noticed is that this also changes the economics considerably. Instead of paying frontier-model prices for every operation, we let local models perform the cheap labor.