Activity
Mon
Wed
Fri
Sun
Jul
Aug
Sep
Oct
Nov
Dec
Jan
Feb
Mar
Apr
May
Jun
What is this?
Less
More

Owned by Theresa

28-Day Action Plan™

171 members • Free

✅ Results, Not Hype 🚀 Achieve ANY Goal Faster 🪴Structure & Accountability 🏡 Support & Community 📲 Custom App for Daily Tracking 💪For DOERS

Memberships

The Founders Guild

1.1k members • Free

Skool Growth Free Training Hub

8.1k members • Free

Claude Code Club

4.8k members • $9/month

eXplorers 🚀

71 members • $7/month

Aquarium Tips For Beginners!

2 members • $5/month

Strong Core Moms

57 members • Free

YouTube Growth Systems

16 members • Free

86 contributions to AI Automation Society
"This model isn't good enough yet. I'll wait for the next one."
I hear some version of this constantly and it's almost always wrong. Not because the models are perfect because I know they're not. But because "is the model good enough" blames the tech rather than yourself. AI adoption isn't binary. It's not "can the agent do this entire job for me? Yes or no?" It's "how much can it do, how much do I need to guide it, and where does it still make me faster than I was?" Right now there's a massive gap in how people use this stuff. On one end, someone is running a business by themselves that used to take a team of 15. On the other end, someone opens the AI tool their company gave them, asks for some research, watches it hallucinate everything, closes it, and decides AI just isn't there yet. If everyone has access to the same models, then why are we seeing people get drastically different outcomes? Because if someone is getting great results from a setup you could copy today, the bottleneck isn't the model. It's the driver. The way I think about it, there are three layers: → The model is the engine. Opus, GPT, Gemini, whatever you're running. Everyone can buy the same one. → The harness is the car built around that engine. Claude Code, Codex, OpenClaw. The tools it can reach, the way it spins up sub-agents to split up the work, the whole system that turns a raw model into something that can actually do a job. → You're still the driver. Your prompts. The context you feed it. The memory and skills you set up so it knows how you work. And the steering, for when it starts to drift. You can put the car on cruise control. But if you don't steer, you're still going to crash. (Yeah, I know some cars have lane assist now. You get the point.) A while back, Andrew Ng ran a version of this. GPT-3.5, an older and "worse" model, wrapped in a simple agentic workflow, hit around 95% on a coding test. GPT-4 on its own, no workflow, hit 67%. That workflow is the harness. A better harness around an older engine beat a newer engine running on its own.
9 likes • 2d
Experience using the thing, beats thinking about why you shouldn’t use the thing because it’s not good enough yet
Claude is getting stupider every day.
I experienced that Claude code is getting so stupid that upgrading to 4.8 is ridiculously stupid. He is full of mistakes, and I have to correct him all the time. If I don't watch every single code, it's going to slop. Anybody else have this experience with this fantastic update of Claude code?
1 like • 3d
I got back to my computer after dinner and found an update and launched it. We will see if anything improves. I am so, so, so close to canceling my subscription.
0 likes • 2d
@Don Jefe I’ll find out tomorrow.
🚀New Video: Is Claude Mythos Coming?
This morning a Mythos identifier showed up on Anthropic's API, people screenshotted it, and the whole timeline decided a launch was days away. I break down what Mythos actually is, what really happened this morning, and why a leak plus widening access still isn't the same as a public launch. My honest bet is that the capability quietly folds into the next Opus before anyone ever logs into something called Mythos, and the thing you should really be watching is OpenAI's next move.
3 likes • 3d
@Vitalie Marin it does, what a disappointment. this whole week has just wasted my time in ridiculous loops of fixing problems Claude is creating. SO OVER IT
1 like • 3d
@Vitalie Marin I already have an open AI business account or ChatGPT and it has a CODEX subscription seat. I think I’m going to start exploring that because I certainly do not want to be held hostage to all this nonsense. It’s eating away at my productivity and I also don’t like paying $200 a month for something that is not really working right
Has Claude Code ever done something you didn’t ask for in a live production environment
I’m chasing down a little bit of a mystery story. I have very tight guardrails on my Claude code, and I have a very specific methodology to how I deploy changes through GitHub in my app. I have built a app that uses a streak counter, very similar to what we see here on this platform and on GitHub. When I updated a personal plan on my app, the streak counter was off by a day and yesterday I was going through my checklist of things that needed to be fixed this weekend and that was one of them, and I started iterating with Claude code about potential fixes and new designs. When I logged in this morning, it was still one day behind. When I logged in this afternoon, the streak account had updated itself by itself. Each users app runs on their local time so midnight is the day that the streaks start over. This is definitely a head scratcher. I am currently waiting for Claude code to come back with an answer for me. It in the last 24 hours, I have caught Claude code making excuses and in a few cases outright making stuff up. This is on the opus 4.8 model. The underlying model that is running my app is silent 4.6. I have screenshot of the response. I got when I query how the streak counter had been updated. Freaked out. Is this happening to anyone else who has live production apps or who is working with Claude code on other projects?
Has Claude Code ever done something you didn’t ask for in a live production environment
1 like • 4d
@Leon H yes, maddening
0 likes • 4d
@William Klawitter it was just very strange that it fixed itself after being off by a day for two entire days.
Most people pick their AI model based on a benchmark...
But I pick mine based on feel. That probably sounds backwards because we're trained to trust the numbers. This model scores 90%, that one scores 80%, so the first one must be better... right? Well a story broke this week about SWE-Bench. It's the test that checks whether an AI can fix real problems in real software, like a human programmer would. It's the score a lot of technical people have leaned on for over a year. Turns out the models were cheating. The test projects already had the correct answers sitting inside them. So instead of solving the problem, a model could peek at the solution and hand it back. Like taking an exam with the answer key taped inside the textbook. On the SWE-Bench, GPT-5.5 scored 58.6% and Gemini 3.5 Flash scored 55.1%. Only 3.5 points apart? If you've ever used those two models, you know the math isn't "math-ing" there. Then a new test showed up called DeepSWE. Same idea but they pulled the answers out, so the model has to actually figure it out. On DeepSWE, GPT-5.5 scored 70%. Gemini 3.5 Flash scored 28%. The two "tied" models weren't close at all. And that gap lines up with how different these tools actually feel to use. None of this makes benchmarks useless. They're fun to look at and they give you a rough starting point. But remember who makes most of them. A big score is a marketing asset. It's the number on the launch tweet. The keynote slide. The headline. So always take them with a grain of salt. What I actually do is I bounce between Opus and GPT all day. Not because one won a benchmark, but because I've built a feel for which one handles which kind of task. For serious work right now, those two are the only horses I really trust in this race. Building that feel isn't exciting. You take one task you actually need done, run it through different models/harnesses, and notice which one you trust with the result. Do that enough times and you stop reaching for the leaderboard. → A model that's perfect for someone else can be the wrong pick for you.
Most people pick their AI model based on a benchmark...
1 like • 5d
@Lana Frei it’s been SOOOO disappointing. Especially after Anthropic went out of their way to say how it’s more honest
2 likes • 5d
@Maurice McCaffrey that’s useful context. I have a project management background and so every time I’ve been writing the code base or making any updates to the code based I’ve been using a PMP framework so that all changes and the initial projects are catalog really carefully thank goodness and of course I’m using GitHub so anything can be rewound from there. I’m noticing a lot of complaints this week about Op. 4.8. I think there is some issue with the model. I am a subscriber on the alerts for both Anthropic and open AI for their status and my phone has just been blowing up this week with errors from Anthropic and in fact right now there’s another one and I’ve had to stop what I’m doing and pivot. I use OpenRouter for my app because Claude has been so unreliable. I needed to have a back up model (or three) so that my app stays operational. I’m definitely learning as I go This is my first production app
1-10 of 86
Theresa Elliott
6
878points to level up
@theresa-elliott-8825
Navy Veteran. Mom. Wife. AI Builder. Join My FREE 28-Day Action Plan community to ACHIEVE ANY GOAL. Results, Not Hype. Come See for Yourself!

Active 4h ago
Joined Dec 26, 2025
ENTP
outside Washington, DC
Powered by