User
Write something
AI Coding Agents for QA: Part 4 — Why the Same Model Gives Different Test Results
In Part 3 I introduced Cursor and why IDE tools beat CLI for QA automation. But before we go deeper into Cursor features, there is a bigger question worth answering. ──────────────────────────────────────── 𝐓𝐰𝐨 𝐄𝐧𝐠𝐢𝐧𝐞𝐞𝐫𝐬. 𝐒𝐚𝐦𝐞 𝐌𝐨𝐝𝐞𝐥. 𝐃𝐢𝐟𝐟𝐞𝐫𝐞𝐧𝐭 𝐑𝐞𝐬𝐮𝐥𝐭𝐬. Engineer A asks GPT-5.4 to write a login test. Gets back: a clean, structured test. Uses their proper fixtures. Follows their naming convention. Works on first run. Engineer B does the same thing. Same model. Same task. Gets back: a generic, broken test. Hardcoded credentials. No page objects. Fails immediately. ──────────────────────────────────────── 🚫 𝐌𝐨𝐬𝐭 𝐏𝐞𝐨𝐩𝐥𝐞 𝐁𝐥𝐚𝐦𝐞 𝐭𝐡𝐞 𝐌𝐨𝐝𝐞𝐥 "GPT is bad at tests." "GPT doesn't understand Playwright." "I need a better model." That is the wrong diagnosis. The model is not the problem. All modern models can code really well. Three other things determine quality. ──────────────────────────────────────── ⚙️ 𝐋𝐚𝐲𝐞𝐫 𝟏: 𝐓𝐡𝐞 𝐓𝐨𝐨𝐥 As covered in Part 1, you never talk to the model directly. ► You ► Tool ► Model The tool decides what to send to the model. What context. What files. What history. Cursor sends your repo structure, open files, and recent edits. A chat app sends nothing. Same model. Different tool. Completely different output. ──────────────────────────────────────── 📁 𝐋𝐚𝐲𝐞𝐫 𝟐: 𝐑𝐞𝐩𝐨 𝐐𝐮𝐚𝐥𝐢𝐭𝐲 AI agents amplify whatever already exists in your project. Good framework? The agent writes tests that slot right in. No page objects, no fixtures, no structure? The agent writes whatever it can. Which is usually a mess. This is the hard truth: AI cannot rescue a bad codebase. It makes it worse, faster. The model is only as good as what it can see. If your repo has: ∙ Clear fixture files ∙ Consistent naming ∙ Reusable page objects ∙ Good test examples The agent pattern-matches against all of that and writes code that fits. If it sees nothing, it invents everything. Pure lottery. ──────────────────────────────────────── 📝 𝐋𝐚𝐲𝐞𝐫 𝟑: 𝐓𝐡𝐞 𝐓𝐚𝐬𝐤 𝐒𝐩𝐞𝐜 "Write a login test" is not a task spec. It is a hint.
AI Coding Agents for QA: Part 4 — Why the Same Model Gives Different Test Results
AI Coding Agents for QA: Part 3 — IDE Tools
In Part 2 I covered CLI tools. They work. But for QA automation especially if you're just starting... they're simply the wrong tools. ──────────────────────────────────────── 𝐘𝐨𝐮 𝐒𝐞𝐞 𝐄𝐯𝐞𝐫𝐲𝐭𝐡𝐢𝐧𝐠 ➤ CLI gives you output on a screen. A wall of text. ➤ IDE tools show changes line by line, inside your actual files. Right in front of you. In Cursor specifically, you accept or reject each change individually. One line at a time. That matters for beginners. When something goes wrong, you see exactly what changed and where. You can ask the AI to explain the change while looking at it. Not a printout. The actual code that helps you to actually learn. ──────────────────────────────────────── 🔹 𝐖𝐡𝐚𝐭 𝐂𝐮𝐫𝐬𝐨𝐫 𝐀𝐜𝐭𝐮𝐚𝐥𝐥𝐲 𝐈𝐬 Cursor is a fork of VS Code. Fork means: a copy of an existing code, taken in a new direction. VS Code is Microsoft's editor. Cursor took that foundation and rebuilt it for AI from the ground up. Compare that to Copilot. Copilot is a plugin bolted onto VS Code. It was added after the fact. Not designed to be there. That difference shows up in practice. Cursor was built with AI as the core. Copilot was added on top. ──────────────────────────────────────── ⚡ 𝐌𝐮𝐥𝐭𝐢𝐩𝐥𝐞 𝐌𝐨𝐝𝐞𝐥𝐬, 𝐎𝐧𝐞 𝐓𝐨𝐨𝐥 Cursor gives you access to models from both Anthropic and OpenAI in one place. Claude Sonnet. Claude Opus. GPT-4o. You pick per task. ⤷ Hard problem or complex refactor? Use Opus or GPT Codex ⤷ Quick fix or small helper? Use something cheaper. That lets you control spending and get the best output without switching tools and having 2 subscriptions. Pricing is also transparent. You know what you're paying. No surprises. ──────────────────────────────────────── 🌐 𝐓𝐡𝐞 𝐁𝐮𝐢𝐥𝐭-𝐈𝐧 𝐁𝐫𝐨𝐰𝐬𝐞𝐫 Cursor has a browser built directly into the IDE. 1. Open any page. 2. Click on elements: buttons, inputs, dropdowns, etc. 3. Ask Cursor to extract the best locators for your test automation. Hunting for locators manually is one of the most tedious parts of UI testing. This feature cuts that work significantly.
AI Coding Agents for QA: Part 3 — IDE Tools
(New Members Start Here) Welcome to AI & QA Accelerator!
👋 Hey there! 𝐖𝐞𝐥𝐜𝐨𝐦𝐞 𝐭𝐨 𝐀𝐈 & 𝐐𝐀 𝐀𝐜𝐜𝐞𝐥𝐞𝐫𝐚𝐭𝐨𝐫. AI is changing Software Development. And it is changing QA with it. QA Engineers who know how to use AI will: ⬩Deliver in days what used to take two weeks ⬩Do work that used to require deep expertise. With AI, basic knowledge can produce senior-level results ⬩Get instant AI feedback on tests, code, and debugging decisions The same applies to Software Developers. AI multiplies their delivery speed. QA becomes the bottleneck. That's why companies are fighting to hire QA Engineers who can match that speed. 💡 In fact, as of early 2026, many companies started adding AI coding tasks to their interview process. QA Engineers who ignore AI won't just fall behind, they risk losing their career entirely. That's not doomsaying. In 2026, tech companies laid off 55,775 people (https://www.trueup.io/layoffs). So, are those layoffs because AI is replacing people? No. AI is not replacing anyone. People who use AI are replacing people who don’t. Unlike the transition from Manual Testing to QA Automation, which took a decade, this shift is happening fast. Capable AI Coding Agents only became real in late 2025. Just a few months later, the entire tech world had changed. That's what this community is about. It's for people who see this shift and understand that right now is not just a pivotal moment for them. It's a short golden window to become one of the first truly AI-Powered QA Automation Engineers / SDETs and set yourself up for a long, safe, and extremely high-paying QA career. ──────────────────────────────────────── 𝐀𝐛𝐨𝐮𝐭 𝐌𝐞, 𝐚𝐧𝐝 𝐰𝐡𝐲 𝐈 𝐚𝐦 𝐛𝐮𝐢𝐥𝐝𝐢𝐧𝐠 𝐭𝐡𝐢𝐬 𝐂𝐨𝐦𝐦𝐮𝐧𝐢𝐭𝐲 I'm 𝐌𝐚𝐭𝐯𝐢𝐲, a Vegas-based 𝐏𝐫𝐢𝐧𝐜𝐢𝐩𝐚𝐥 𝐒𝐃𝐄𝐓 with 𝟏𝟎+ 𝐲𝐞𝐚𝐫𝐬 𝐨𝐟 𝐞𝐱𝐩𝐞𝐫𝐢𝐞𝐧𝐜𝐞. I’ve worked across startups and large enterprises, building QA automation frameworks and testing infrastructure across pretty much all modern stacks and tools. In 2025 I introduced AI coding agents into my team's QA Automation workflows. The team adopted it. Management noticed.
(New Members Start Here) Welcome to AI & QA Accelerator!
AI Coding Agents for QA: Part 2 — Types of the AI Coding Agent
In Part 1 I promised to tell you which tools actually work. Let's start by ruling one category out. ──────────────────────────────────────── 🚫 𝐒𝐭𝐨𝐩 𝐔𝐬𝐢𝐧𝐠 𝐂𝐡𝐚𝐭 𝐀𝐩𝐩𝐬 𝐟𝐨𝐫 𝐂𝐨𝐝𝐢𝐧𝐠 ChatGPT, Claude.ai, Gemini — these are not coding tools. I know. You can paste code into them. You can ask questions. It feels like it should work. But here's the problem: these tools were trained to answer everything. Recipes. Health advice. Legal questions. Your Playwright test suite. Coding task. Those tools treat them all the same way. They also have zero access to your repo. They don't know your folder structure, your test helpers, your naming conventions — nothing. So every answer is generic. It could fit any codebase, anywhere. Generic = useless for real coding work ──────────────────────────────────────── ✦︎ 𝐂𝐋𝐈 𝐯𝐬 𝐈𝐃𝐄: 𝐖𝐡𝐚𝐭'𝐬 𝐭𝐡𝐞 𝐃𝐢𝐟𝐟𝐞𝐫𝐞𝐧𝐜𝐞? Coding-specific tools split into two types: ► CLI — you run them from the terminal, inside your repo ► IDE — they live inside your editor (Cursor, VS Code, etc.) CLI means Command Line Interface. You open your terminal, go to your project, and run something like: `>_ claude -p "add a login test to the checkout suite"` The agent reads your actual code, understands your project, and does the work. ──────────────────────────────────────── ✦︎ 𝐓𝐡𝐞 𝟒 𝐂𝐋𝐈 𝐓𝐨𝐨𝐥𝐬 𝐘𝐨𝐮 𝐍𝐞𝐞𝐝 𝐭𝐨 𝐊𝐧𝐨𝐰 🔹 𝐂𝐥𝐚𝐮𝐝𝐞 𝐂𝐨𝐝𝐞 built by Anthropic It has three models for three use cases: - Opus — the most powerful. Complex refactors, hard bugs, architecture decisions. Expensive. - Sonnet — the daily driver. Fast, accurate, handles most coding tasks and documentation well. - Haiku — fast and cheap. Good for the small jobs only: renaming files, adding a helper, generating a fixture. Pricing works on a "window" system. You buy a plan ($20 / $100 / $200 per month) and each plan comes with a usage limit. That limit resets every 5 hours and every week. In practice: burn through your limit at 2pm, wait until 7pm for the reset. It sounds annoying. Once you learn to match the model to the task you rarely hit the cap.
AI Coding Agents for QA: Part 2 — Types of the AI Coding Agent
AI Coding Agents for QA: Part 1 — What They Are and Why It Matters
AI is everywhere, and it's easy to feel overwhelmed. Codex. Claude Code. Cursor. Windsurf. Copilot. New names every week, new hype every day. But they all describe the same concept: AI coding agents. ──────────────────────────────────────── 𝐖𝐡𝐚𝐭 𝐈𝐬 𝐚𝐧 𝐀𝐈 𝐂𝐨𝐝𝐢𝐧𝐠 𝐀𝐠𝐞𝐧𝐭? Simple: it's a tool that interacts with AI and generates code. That's it. But like any tool in a QA engineer's kit, not all of them are equal. Some are great for specific tasks, some are poor at most things, and some are solid generalists you can use anywhere and get good results. I spent over $3,000 testing them so you don't have to. In this series of posts I'll share exactly what I found. Today, we start with the fundamentals. ──────────────────────────────────────── 🧠 𝐖𝐡𝐚𝐭 𝐈𝐬 𝐚𝐧 𝐋𝐋𝐌? LLM stands for Large Language Model, the brain powering every AI coding agent. But here's the key thing to understand: you never talk to the LLM directly. There's always a tool sitting in between: ► YOU ► Tool (Cursor / Copilot / Claude Code) ► LLM (GPT-5 / Claude / Gemini) The same pattern applies when you use AI chat apps, except the interface is built for conversation, not code. ──────────────────────────────────────── ⚡ 𝐖𝐡𝐲 𝐓𝐡𝐢𝐬 𝐌𝐚𝐭𝐭𝐞𝐫𝐬 𝐟𝐨𝐫 𝐘𝐨𝐮 The tool (cursor, etc) you pick is responsible for roughly 50% of your results. Here's why: the tool reads your code, decides what information to send to the LLM, and determines how much the AI actually understands about your project and how it can write the actual code. Different tools. Different developers. Different quality. Same LLM. Wildly different output. This is exactly why the same engineer, using the same LLM but a different tool, can get completely different results. For example, using the exact same ChatGPT LLM in Cursor versus Copilot for the same task will produce very different quality output. ──────────────────────────────────────── 📌 𝐊𝐞𝐲 𝐓𝐚𝐤𝐞𝐚𝐰𝐚𝐲𝐬 - LLM = the brain. You can't access it directly. - Tools (Cursor, Copilot, Claude Code) sit between you and the LLM. - The tool accounts for ~50% of the quality you get. - Different tools, different quality, different output even with the same LLM underneath.
8
0
AI Coding Agents for QA: Part 1 — What They Are and Why It Matters
1-5 of 5
AI & QA Accelerator
skool.com/qa-automation-career-hub
Manual QA → QA Automation → AI-Powered SDET. Pick Your Path. Join AI & QA Accelerator.
Leaderboard (30-day)
Powered by