AI Coding Agents for QA: Part 4 — Why the Same Model Gives Different Test Results

Matviy Cherniavski

16h • AI&QA

In Part 3 I introduced Cursor and why IDE tools beat CLI for QA automation.

But before we go deeper into Cursor features, there is a bigger question worth answering.

────────────────────────────────────────

𝐓𝐰𝐨 𝐄𝐧𝐠𝐢𝐧𝐞𝐞𝐫𝐬. 𝐒𝐚𝐦𝐞 𝐌𝐨𝐝𝐞𝐥. 𝐃𝐢𝐟𝐟𝐞𝐫𝐞𝐧𝐭 𝐑𝐞𝐬𝐮𝐥𝐭𝐬.

Engineer A asks GPT-5.4 to write a login test.

Gets back: a clean, structured test. Uses their proper fixtures. Follows their naming convention. Works on first run.

Engineer B does the same thing. Same model. Same task.

Gets back: a generic, broken test. Hardcoded credentials. No page objects. Fails immediately.

────────────────────────────────────────

🚫 𝐌𝐨𝐬𝐭 𝐏𝐞𝐨𝐩𝐥𝐞 𝐁𝐥𝐚𝐦𝐞 𝐭𝐡𝐞 𝐌𝐨𝐝𝐞𝐥

"GPT is bad at tests." "GPT doesn't understand Playwright." "I need a better model."

That is the wrong diagnosis.

The model is not the problem. All modern models can code really well.

Three other things determine quality.

────────────────────────────────────────

⚙️ 𝐋𝐚𝐲𝐞𝐫 𝟏: 𝐓𝐡𝐞 𝐓𝐨𝐨𝐥

As covered in Part 1, you never talk to the model directly.

► You ► Tool ► Model

The tool decides what to send to the model. What context. What files. What history.

Cursor sends your repo structure, open files, and recent edits. A chat app sends nothing.

Same model. Different tool. Completely different output.

────────────────────────────────────────

📁 𝐋𝐚𝐲𝐞𝐫 𝟐: 𝐑𝐞𝐩𝐨 𝐐𝐮𝐚𝐥𝐢𝐭𝐲

AI agents amplify whatever already exists in your project.

Good framework? The agent writes tests that slot right in.

No page objects, no fixtures, no structure? The agent writes whatever it can. Which is usually a mess.

This is the hard truth: AI cannot rescue a bad codebase. It makes it worse, faster.

The model is only as good as what it can see. If your repo has:

∙ Clear fixture files

∙ Consistent naming

∙ Reusable page objects

∙ Good test examples

The agent pattern-matches against all of that and writes code that fits.

If it sees nothing, it invents everything. Pure lottery.

────────────────────────────────────────

📝 𝐋𝐚𝐲𝐞𝐫 𝟑: 𝐓𝐡𝐞 𝐓𝐚𝐬𝐤 𝐒𝐩𝐞𝐜

"Write a login test" is not a task spec. It is a hint.

A real spec tells the agent:

∙ What the test should verify

∙ What user state to start from

∙ What fixture to use for credentials

∙ Where to place the new file

∙ What to do when it fails

The more constraints you give, the less the agent has to guess. Guessing is where everything break.

────────────────────────────────────────

Do you know programming, understand how test automation works, and live in the USA?

I'm opening a live, small-cohort workshop on AI Coding Agents for Test Automation.

➜ You will learn how to structure your repo to make it AI-ready, write real task specs for AI Coding Agents, and get agents to produce high quality tests. Not generic garbage.

➜ This is not just theory. It includes hands-on practice, plus reusable prompts, skills, and AI materials you can apply immediately in your own repo.

➜ Because this is a live workshop, I'm keeping it extremely small: only 1–3 people max.

Access is not open publicly. To keep the group high quality, entry is only available after a short interview in DM.

👉 If you're interested, DM me.

1 comment

AI & QA Accelerator

skool.com/qa-automation-career-hub

Manual QA → QA Automation → AI-Powered SDET. Pick Your Path. Join AI & QA Accelerator.

AI Mock Interviews

Certification Verification

Leaderboard (30-day)