In Part 1 I promised to tell you which tools actually work.
Let's start by ruling one category out.
────────────────────────────────────────
🚫 𝐒𝐭𝐨𝐩 𝐔𝐬𝐢𝐧𝐠 𝐂𝐡𝐚𝐭 𝐀𝐩𝐩𝐬 𝐟𝐨𝐫 𝐂𝐨𝐝𝐢𝐧𝐠
ChatGPT, Claude.ai, Gemini — these are not coding tools.
I know. You can paste code into them. You can ask questions. It feels like it should work.
But here's the problem: these tools were trained to answer everything. Recipes. Health advice. Legal questions. Your Playwright test suite. Coding task. Those tools treat them all the same way.
They also have zero access to your repo. They don't know your folder structure, your test helpers, your naming conventions — nothing. So every answer is generic. It could fit any codebase, anywhere.
Generic = useless for real coding work
────────────────────────────────────────
✦︎ 𝐂𝐋𝐈 𝐯𝐬 𝐈𝐃𝐄: 𝐖𝐡𝐚𝐭'𝐬 𝐭𝐡𝐞 𝐃𝐢𝐟𝐟𝐞𝐫𝐞𝐧𝐜𝐞?
Coding-specific tools split into two types:
► CLI — you run them from the terminal, inside your repo
► IDE — they live inside your editor (Cursor, VS Code, etc.)
CLI means Command Line Interface. You open your terminal, go to your project, and run something like:
`>_ claude -p "add a login test to the checkout suite"`
The agent reads your actual code, understands your project, and does the work.
────────────────────────────────────────
✦︎ 𝐓𝐡𝐞 𝟒 𝐂𝐋𝐈 𝐓𝐨𝐨𝐥𝐬 𝐘𝐨𝐮 𝐍𝐞𝐞𝐝 𝐭𝐨 𝐊𝐧𝐨𝐰
It has three models for three use cases:
- Opus — the most powerful. Complex refactors, hard bugs, architecture decisions. Expensive.
- Sonnet — the daily driver. Fast, accurate, handles most coding tasks and documentation well.
- Haiku — fast and cheap. Good for the small jobs only: renaming files, adding a helper, generating a fixture.
Pricing works on a "window" system. You buy a plan ($20 / $100 / $200 per month) and each plan comes with a usage limit. That limit resets every 5 hours and every week.
In practice: burn through your limit at 2pm, wait until 7pm for the reset. It sounds annoying. Once you learn to match the model to the task you rarely hit the cap.
It has two model families:
- GPT models — general purpose. Good for reading code, summarizing, writing docs. Not great at actually writing code.
- Codex models — purpose-built for coding. They can't hold a good conversation and won't explain things well. But they write code extremely well.
The mistake most beginners make: they use GPT models and wonder why the output is mediocre. Use Codex models for coding. That's what they exist for.
Pricing: same window structure as Anthropic. Plans at $20 / $200.
This one is worth knowing about. Opencode bundles free open-source models plus optional Anthropic and OpenAI access. Download it, point it at your repo, start using it. No account. No card. No cost.
The free models are average quality. But average quality for zero dollars is a fair trade when you're just getting started. If you want to try CLI agents before spending any money — start here.
🔹 𝐆𝐞𝐦𝐢𝐧𝐢 𝐂𝐋𝐈 built by Google I tested it. It's bad. Skip it.
────────────────────────────────────────
📌 𝐖𝐡𝐚𝐭 𝐒𝐡𝐨𝐮𝐥𝐝 𝐘𝐨𝐮 𝐀𝐜𝐭𝐮𝐚𝐥𝐥𝐲 𝐔𝐬𝐞?
It depends on your goals and your background. For learning and small tasks, Opencode can be a good fit.
That said, CLI tools aren’t the best choice for QA automation, especially if you’re also doing UI testing.
So, this post covered CLI tools overview, and the IDE AI Agents are coming in Part 3, and that’s where things get genuinely interesting for QA Automation Engineers.
────────────────────────────────────────
To use AI coding agents effectively, you must know programming and understand proper test framework architecture. Start learning today and get ready for the next stage of QA: AI Automation Engineers.