AI Coding Agents for QA: Part 1 โ What They Are and Why It Matters
AI is everywhere, and it's easy to feel overwhelmed. Codex. Claude Code. Cursor. Windsurf. Copilot. New names every week, new hype every day. But they all describe the same concept: AI coding agents. โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ ๐๐ก๐๐ญ ๐๐ฌ ๐๐ง ๐๐ ๐๐จ๐๐ข๐ง๐ ๐๐ ๐๐ง๐ญ? Simple: it's a tool that interacts with AI and generates code. That's it. But like any tool in a QA engineer's kit, not all of them are equal. Some are great for specific tasks, some are poor at most things, and some are solid generalists you can use anywhere and get good results. I spent over $3,000 testing them so you don't have to. In this series of posts I'll share exactly what I found. Today, we start with the fundamentals. โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ ๐ง ๐๐ก๐๐ญ ๐๐ฌ ๐๐ง ๐๐๐? LLM stands for Large Language Model, the brain powering every AI coding agent. But here's the key thing to understand: you never talk to the LLM directly. There's always a tool sitting in between: โบ YOU โบ Tool (Cursor / Copilot / Claude Code) โบ LLM (GPT-5 / Claude / Gemini) The same pattern applies when you use AI chat apps, except the interface is built for conversation, not code. โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โก ๐๐ก๐ฒ ๐๐ก๐ข๐ฌ ๐๐๐ญ๐ญ๐๐ซ๐ฌ ๐๐จ๐ซ ๐๐จ๐ฎ The tool (cursor, etc) you pick is responsible for roughly 50% of your results. Here's why: the tool reads your code, decides what information to send to the LLM, and determines how much the AI actually understands about your project and how it can write the actual code. Different tools. Different developers. Different quality. Same LLM. Wildly different output. This is exactly why the same engineer, using the same LLM but a different tool, can get completely different results. For example, using the exact same ChatGPT LLM in Cursor versus Copilot for the same task will produce very different quality output. โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ ๐ ๐๐๐ฒ ๐๐๐ค๐๐๐ฐ๐๐ฒ๐ฌ - LLM = the brain. You can't access it directly. - Tools (Cursor, Copilot, Claude Code) sit between you and the LLM. - The tool accounts for ~50% of the quality you get. - Different tools, different quality, different output even with the same LLM underneath.