Saving money is easy when you know how

Feb 24 (edited) • 🎉 Wins

The Realization: You Are Overpaying for "Thinking"

Stop paying massive premiums for expensive APIs! High-end models charge exorbitant token fees for their reasoning capabilities. You can bypass these massive markups by utilizing Gemini 3.1 Pro, which offers native, adjustable reasoning at a fraction of the cost.

Anthropic runs on Google silicon. Google invested billions of dollars into Anthropic, and a massive part of that deal was that Anthropic must use Google Cloud and Google's custom TPU chips to train and run their models. They are deeply financially and technologically entangled.

According to a bombshell report from techcrunch.com, Google contractors were caught comparing Gemini's answers directly against Claude's. It got so blatant that Gemini was actually caught spitting out the exact phrase, "I am Claude, created by Anthropic" because Google was allegedly feeding Claude's data into Gemini's evaluation and training pipelines opentools.ai.

So while they are technically separate companies on paper, the reality is exactly what you are sensing: the lines are incredibly blurred. They share the same Google hardware. Google is a massive financial backer of Anthropic. They are literally cannibalizing each other's training data and outputs.
I also think claudes models are designed to send you through loops to drain your account but google can simply skim it for free.. lol

The Strategy: OpenRouter presets + Gemini 3.1 Pro

Here is the exact setup to stop bleeding money and run multi-agent swarms on a budget:

Step 1: Create "Thinking" Presets. Go into OpenRouter and set up three distinct presets for Gemini 3.1 Pro based on reasoning effort:

🟢 Low Thinking (for simple, fast tasks)
🟡 Medium Thinking (for standard logic)
🔴 High Thinking (for complex problem-solving)

(see attached config screenshots)

Step 2: Automate the Routing Don't use "High Thinking" for everything. Use a custom skill or script to evaluate your prompt's complexity before sending it, and automatically route it to the appropriate preset. I attached mine but you could make your own if you like. You only pay for heavy compute when the task actually requires it.

Step 3: The Secret Weapon — This is where massive savings happens. In OpenRouter, enable the "Exclude Reasoning from Response" setting in the preset (see image attached)

Why? Providers gouge you by charging for the massive blocks of invisible "thinking" tokens in the output.

By stripping these redundant tokens before they hit your bill, you drastically cut output costs. Your local agent (like Agent Zero) already retains the full memory and context of the conversation anyway, so you lose nothing but the bloated invoice

The Result 📉

By dynamically switching thinking levels using googles dirt cheap api that knows everything claude knows and stripping out reasoning tokens, you can run intensive, sessions for a full workday on about $20—completely eliminating the need for a $1,000/day AI budget.

You're Welcome

9 comments

Saving money is easy when you know how