Anthropic just made Claude Skills worth taking seriously

If you've built Claude Code skills, or downloaded any from GitHub, there's something worth knowing about this week.

Anthropic released an update to Skill Creator on 3 March. The short version: you can now test whether your skills actually work, and whether they still need to exist at all.

Skill Creator now has four modes: Create, Eval, Improve, and Benchmark. The new piece is the eval pipeline. You define test cases, describe what good output looks like, and Skill Creator tells you whether the skill passes. It also runs blind A/B comparisons between skill versions, so you can see whether a change actually improved anything rather than guessing.

This matters if you're not an engineer. Most people building skills know their workflows but have no way to verify the skill is doing what they think. This update closes that gap without requiring any code.

The more useful question is what to do about your existing skills. Anthropic draws a helpful distinction. Capability uplift skills teach Claude something the base model couldn't do consistently. These can become redundant as models improve, and you'd have no way of knowing. Encoded preference skills lock in your specific workflow or standards, and these tend to last longer, but only if they accurately reflect how you actually work. The eval framework now tells you which skills are earning their place and which are dead weight.

Available now in Claude.ai, Cowork, and as a Claude Code plugin. Install the updated Skill Creator, run /skill-creator, and ask it to evaluate your most-used skill first.

This is most relevant if you're already using Claude Code. If you're not, it's a reasonable prompt to look at it.

3 comments