Igor Bombała

I build with AI same as most of you. But I spend more time in front of business owners than in my editor, and that flipped how I sell. I used to pitch the build. "I'll redesign your site." "I'll optimize conversions." "I'll automate booking." And most owners would nod, say it sounds good, then disappear. Now I do something different. Before I pitch anything, I pull out my phone in front of the owner and open their own website. Then I just narrate what a real customer sees: "It's been four seconds and it's still loading." "Your phone number isn't clickable." "I have to pinch and zoom to read your opening hours." "To book, I need to fill out eight fields on a phone." I don't talk about my stack. I don't talk about AI. I don't talk about the build. I let them watch their own front door from the outside for the first time. Then I ask one question: "How many customers call you from the website every week?" Most of them don't know. And that's the whole point. If they don't know the number, they're not measuring it. If they're not measuring it, they're losing money without feeling it. That's when the conversation changes. The website stops being "fine, it works" and becomes a visible leak. And once they see the leak, fixing it no longer feels like a cost. It feels like protecting money that's already walking out the door. Most builders keep describing the plug. Better salespeople show the leak first. Steal this. Works in almost any niche, not just websites.

New comment 4h ago

Nate Herk

💎

⭐

8h •

General Discussion 💬

Most people pick their AI model based on a benchmark...

But I pick mine based on feel. That probably sounds backwards because we're trained to trust the numbers. This model scores 90%, that one scores 80%, so the first one must be better... right? Well a story broke this week about SWE-Bench. It's the test that checks whether an AI can fix real problems in real software, like a human programmer would. It's the score a lot of technical people have leaned on for over a year. Turns out the models were cheating. The test projects already had the correct answers sitting inside them. So instead of solving the problem, a model could peek at the solution and hand it back. Like taking an exam with the answer key taped inside the textbook. On the SWE-Bench, GPT-5.5 scored 58.6% and Gemini 3.5 Flash scored 55.1%. Only 3.5 points apart? If you've ever used those two models, you know the math isn't "math-ing" there. Then a new test showed up called DeepSWE. Same idea but they pulled the answers out, so the model has to actually figure it out. On DeepSWE, GPT-5.5 scored 70%. Gemini 3.5 Flash scored 28%. The two "tied" models weren't close at all. And that gap lines up with how different these tools actually feel to use. None of this makes benchmarks useless. They're fun to look at and they give you a rough starting point. But remember who makes most of them. A big score is a marketing asset. It's the number on the launch tweet. The keynote slide. The headline. So always take them with a grain of salt. What I actually do is I bounce between Opus and GPT all day. Not because one won a benchmark, but because I've built a feel for which one handles which kind of task. For serious work right now, those two are the only horses I really trust in this race. Building that feel isn't exciting. You take one task you actually need done, run it through different models/harnesses, and notice which one you trust with the result. Do that enough times and you stop reaching for the leaderboard. → A model that's perfect for someone else can be the wrong pick for you.

New comment 2h ago