SkillOpt — Has Anyone Looked at This?

A CEO of an MSP that I've been one on one consulting on ICM sent this article to me this morning. Microsoft's SkillOpt automatically upgrades AI agent skills without touching model weights

Repo: https://github.com/microsoft/SkillOpt

The short version: instead of fine-tuning model weights, it treats your markdown skill files as the trainable parameter. It runs your agent against benchmark tasks, analyzes what went wrong, proposes bounded edits to the skill doc, and only accepts changes if held-out validation strictly improves. The deployed artifact is a single best_skill.md file — no extra model calls at inference.

They're reporting +19.1 points on Claude Code benchmarks. That's output quality — the agent getting the right answer more often — not token savings.

When I first started building skills in my ICM, they were bloated. Long, unstructured, burning tokens. If you're letting Claude build your skills for you (which is the natural thing to do), you're not necessarily getting an optimized artifact — you're getting whatever Claude thought was thorough at the time.

The question SkillOpt is trying to answer: is the skill actually performing, or is it just big?

Where I'm less sure it translates: most of the benchmarks are coding tasks, where "correct" is binary. I have three copywriters — Cash, Clyde, and Wradley. Scoring whether a piece of copy is better is a different problem. Harder to define, harder to gate automatically.

Is this a layer worth putting on top of nuanced specialist work? Has anyone here dug into it?

41 comments