Activity
Mon
Wed
Fri
Sun
Apr
May
Jun
Jul
Aug
Sep
Oct
Nov
Dec
Jan
Feb
Mar
What is this?
Less
More

Memberships

63 contributions to AI Automation Society
GPT-5.4 Mini and Nano just dropped
Everyone in the AI space is talking about the new small models from OpenAI. Most people are reading the marketing. I read the benchmarks. Here is what actually changed: Mini went from 42% to 72.1% on OSWorld-Verified. Terminal-Bench jumped from 38.2% to 60%. SWE-Bench Pro moved from 45.7% to 54.4%. That is not a minor upgrade. That changes where you draw the line between your main model and your worker model. Now here is the part most people will miss. The 400k context window is a trap. Mini scores 33.6% on the 128K to 256K long-context needle test. Stuff 200,000 tokens of logs into it and it fails roughly 66% of the time. Context window and context skill are not the same thing. The routing decision is actually straightforward once you see it clearly: Full model handles planning, ambiguity, and final judgment. Mini runs parallel subagents, tool calls, and screenshot-heavy workflows. Nano handles classification, extraction, ranking, and tight JSON tasks only. One hard rule: do not put nano anywhere near UI navigation or computer use. It scores 39% on OSWorld while mini scores 72.1%. The builders who win here will not just swap models. They will redesign routing. Would love to hear how others in this community are thinking about model tiering in their agentic stacks. Drop your current setup below.
GPT-5.4 Mini and Nano just dropped
0 likes • 5h
If it resonates, a like or repost on LinkedIn goes a long way in helping this reach more builders who need it. Here is the post 👇 https://www.linkedin.com/posts/karthikeyan-rajendran07_aiengineering-agenticai-llms-activity-7439870855708524544-Q-Pp
Why Codex Subagents Actually Matter
One of the most useful ideas in the new Codex subagents feature is not “more agents.” It is cleaner thinking. A lot of people assume the main bottleneck with coding agents is speed. In practice, the bigger problem is usually noise. Stack traces, test logs, edge cases, exploration notes, and random side quests all get dumped into one thread. The result is that decision quality starts slipping long before the model feels unusable. That is why Codex subagents stood out to me. The design is not really about flashy parallelism. It is about keeping the main thread focused on requirements, decisions, and final output while pushing noisy intermediate work into smaller, scoped agent threads. That is also why this feels more like a systems design improvement than a prompting trick. A strong starting point is to use subagents for read-heavy tasks like exploration, tests, triage, and summarization. That gives you the upside of parallel work without turning your workflow into a coordination mess. The warning is important too: if you use multiple agents to write code at the same time, things can get chaotic fast. More agents do not fix fuzzy thinking. They just create faster confusion when the task itself is unclear. The bigger shift here is that we are moving from one smart assistant to small systems of agents with clear roles, scoped context, and tighter coordination. That is where a lot of practical leverage is going to come from.
Why Codex Subagents Actually Matter
1 like • 1d
This is the LinkedIn post where I broke this down in more detail: https://www.linkedin.com/posts/karthikeyan-rajendran07_ai-codingagents-developertools-activity-7439439510448386048-KS_7 If you found it useful, I’d really appreciate your support there with a like, comment, and repost.
🚨 Active Now: Exploit Claude's 2x Usage Multiplier
Anthropic is running a massive load-balancing experiment. If you are stress-testing AI workflows, you need to adjust your schedule before March 27. They are heavily incentivizing us to stay off their servers during the US morning rush by doubling usage limits during off-peak hours. The Details You Need: - The Window: 2x usage limits outside 8 AM to 2 PM ET. For us, that means double capacity almost all day, except between 5:30 PM and 11:30 PM IST. - The Exclusions: Applies to Free, Pro, Max, and Team plans. Enterprise is out. - The Surfaces: Works across Claude, Cowork, Claude Code, Claude for Excel, and Claude for PowerPoint. - The Advantage: This bonus usage does NOT eat into your regular weekly limits. The Builder Strategy: Shift your token-hungry processes, heavy automated AI/ML testing, and massive Excel datasets to our standard morning and early afternoon hours. We essentially have double the bandwidth to build and debug without hitting rate limits. Who else is migrating their heavy batch processing to exploit this window? Let’s discuss below.
🚨 Active Now: Exploit Claude's 2x Usage Multiplier
0 likes • 3d
If this helped you rethink your workflow, I'd appreciate a quick like or comment on the original post to help get the word out! 🤝 https://www.linkedin.com/posts/karthikeyan-rajendran07_ai-claude-buildinpublic-activity-7438711719100776448-lCwU
The reality of Claude Code Review (and how to not burn your API budget)
I’ve been digging into the new Claude Code Review release from Anthropic, and it is a massive shift in how we should be looking at AI dev tools. Right now, everyone is obsessed with generating code faster (typing debt). Anthropic just placed a massive bet on the actual bottleneck: review debt. Code volume is up, but human attention during massive Pull Requests is breaking. Here is the unvarnished breakdown of how it actually operates: - The Process: It doesn't just run a quick syntax check. It sends parallel agents into a PR, takes roughly 20 minutes, and verifies bugs internally to kill false positives. - The Metrics: On massive PRs (over 1,000 lines), it catches issues 84% of the time. The false-positive rate is reportedly less than 1%. - The Catch: It costs $15 to $25 per PR. This is where teams are going to mess up. If you blindly plug this into your CI/CD pipeline and run it on every minor typo or UI tweak, you will set your budget on fire. If you want to deploy this, you need a layered defense. Put a fast, cheap linter at the front door to block obvious garbage. Only trigger the heavy Claude agents for deep, structural changes where human fatigue is a real liability. Are any of you planning to integrate this into your deployment pipelines yet? Let’s talk architecture in the comments.
The reality of Claude Code Review (and how to not burn your API budget)
1 like • 8d
If this breakdown sharpened your thinking, I’d appreciate some support on my LinkedIn post where I tear this down further. Drop a like or let me know your thoughts over there: https://www.linkedin.com/posts/karthikeyan-rajendran07_claudecode-codereview-aidevtools-activity-7436905940869623808--wNl
Why the GPT-5.4 "1M Context" is actually a trap
If you are building with the new GPT-5.4, do not blindly trust the 1M context window. OpenAI just put a 2x usage tax on anything over 272K tokens. This is them quietly telling us to stop designing architectures based on vibes and start building for actual compaction. If your agent needs 1M tokens to succeed, your agent is brittle. Tool search is the actual highlight of this release. You no longer need to shove every tool schema into your prompt. In one demo, this feature cut token usage by 47% while keeping the exact same accuracy. That is the difference between affording agentic routing and bankrupting your project. Rule of thumb: Treat 272K as your hard budget, not your dream limit. Good default: Make compaction a first-class step in your pipeline. Watch-out: Do not pay for the "Pro" tier ($180/M output) just for ego compute. Prove it pays for itself first. Let’s stress test this below: What are you building differently now that we have steerable mid-flight reasoning?
1 like • 12d
I just dropped the full technical breakdown over on LinkedIn: https://www.linkedin.com/posts/karthikeyan-rajendran07_ai-llm-productengineering-ugcPost-7435442752307482624-T0v6 If this helped you rethink your AI stack today, head over there and show some support. A like, comment, or repost helps get this in front of more builders who need a reality check on their architecture.
1-10 of 63
Karthik R
5
120points to level up
@karthikeyan-r-5062
I'm a Software Engineer by profession based in India. I'm trying to learn more about AI Automation and its usages

Active 5h ago
Joined Dec 7, 2024
Powered by