When does a prompt become a skill?

Every catalogue of an AI workflow stack lists the same parts in a slightly different order. Prompts, skills, connectors, MCPs, hooks, scripts, plugins. Useful vocabulary. None of those lists answer the question a working operator carries into Monday morning: "I have run this prompt four times this week. Is it a skill yet, or am I overbuilding?"

That is the promotion question, and the vocabulary does not solve it. Naming the seven layers tells you nothing about which one the thing on your screen right now belongs in. The threshold decisions are the work. The taxonomy is the easy part.

The dividing line under every promotion decision

There is one axis that runs underneath every layer of the stack: deterministic versus probabilistic. Scripts compute. Hooks fire on event. Connectors pass structured data. Prompts and skills guess inside a band of plausible outputs.

Deterministic versus probabilistic — the axis that runs underneath every layer of the stack

Every promotion decision sits on that axis. The question to ask before moving any unit of work up one level is whether the work needs a right answer or a good one. A price band is good. A tax number is right. A caption is good. A file path is right. The promotion direction follows. Probabilistic work climbs toward skills and plugins. Deterministic work climbs toward scripts and hooks. Mixing the two in one layer is the first sign a piece of work is in the wrong layer.

Prompt to skill: the trigger is fidelity, not volume

A prompt earns promotion to a skill when one of two things is true. Either you have run it three or more times, or forgetting one of its rules would produce a wrong-feeling output rather than a wrong one.

Prompt to Skill — the first promotion gate: three runs or fidelity to a standard

Three runs is the lower bound because anything you have done three times you will do thirty times if it stays useful. The cost of writing it as a skill once is repaid on run four. The wrong-feeling test is the upper bound. If the output is technically correct but reads off — wrong register, missing a refusal, breaking a voice rule the operator could not name on demand — then the rules live in the operator’s head, and a fresh session does not have access to them. A skill is the place those rules become loadable.

What does not trigger promotion is complexity or length. A 600-word prompt that gets run once is still a prompt. A 30-word instruction that gets run weekly and has to match a brand voice is a skill. The trigger is fidelity to a standard, not size.

Skill to plugin: the trigger is portability or composition

A skill becomes a plugin when it has to travel. Cross-machine, cross-collaborator, cross-time-zone, cross-project. The moment a second person needs to invoke the same behavior with the same guardrails, the skill earns versioning, a distribution path, and a name that does not collide.

Skill to Plugin — the second promotion gate: must travel or bundles deterministic checks

The other trigger is composition. A skill that bundles deterministic checks alongside its probabilistic instructions, with hooks that fire and scripts that compute and a connector that fetches structured input, is no longer one skill. It is a small system, and a plugin is the package that holds a small system together. Same rule as before: size is not the trigger. A two-file plugin that has to travel is a plugin. A 2,000-line skill that lives in one operator’s home directory is still a skill.

What this looks like on disk

Below is one real skill from this sandbox, with names sanitized. Note the shape. Frontmatter that tells the loader when to engage. Body that holds the probabilistic instructions. A decision rule near the bottom that names the boundary explicitly.

---

name: verify-handoff

description: Cross-check a worker's shipped output against the

dispatching brief's Inputs / Outputs / Verify sections. Use when

the operator says "verify [slug]" or "check the handoff."

---

# verify-handoff

Confirm that what the worker shipped matches what the brief asked

for. Closes the gap between "file exists" (cheap, misleading) and

"file is correct" (expensive, useful).

## What to check

1. Every output exists. Parse the brief's Returns-with section.

2. Output is recent. mtime within 24h of the breadcrumb timestamp.

3. Inputs were readable. Each path in the Inputs table exists.

4. Slug consistency. Breadcrumb slug matches brief filename.

## Verdict rule

- CLEAN — all always-checks PASS, no FAIL in brief-Verify.

- DEFECTS — at least one always-check or format-check FAILED with

concrete evidence.

- NEEDS-MANUAL — checks ran but require human judgment to close.

Do not fix what you find. The skill diagnoses. The operator

dispatches a redo or accepts. Auto-fixing crosses the worker

boundary.

The frontmatter loads the rules. The body holds the probabilistic instructions. The verdict rule at the bottom is the decision boundary in plain language. A reader can copy this shape into their own domain in an afternoon. The work that does not transfer is knowing which behavior to skill in the first place.

What most teams actually have

Most teams do not have an AI problem. They have a promotion problem. They keep prompting work that should have been packaged three runs ago, and skilling work that should have been a one-off prompt. The cost is invisible because each run feels cheap. The compounding cost is real: every un-promoted prompt is a small redecision tax paid every time the work runs, and every over-promoted skill is a maintenance tax paid every time the rules drift.

The promotion question is asked in the moment. Is this a skill yet? The answer is rule-driven, not taste-driven. Three runs, or fidelity-to-standard. Anything past that lives one layer up.

10 comments