---
title: "Build and ship a Claude Skill"
excerpt: "9-step playbook for building a Claude Skill that loads on the right queries, runs sandboxed code, and ships to a registry your team can install in one click."
category: "Template"
---

# Build and ship a Claude Skill

    A 9-step playbook. Open in Dock and you'll get four surfaces seeded:

    - **Steps** (table) the 9 gates as rows, owner + due + status
    - **Pointers** (table) every official Anthropic doc + tool linked from this playbook
    - **Brief** (doc) the canonical skill spec you maintain alongside the work
    - **Eval log** (table) one row per test query, with expected vs. actual skill load

    Open `Steps` first. Each row is a gate. Click into a step to see the tasks, pointers, and the agent prompt for that step.

## Outcome

A Claude Skill that loads automatically on the right queries, runs sandboxed scripts when needed, and is packaged + versioned so your team can install it in one command.

**Estimated time:** 2-5 days  
**Difficulty:** intermediate  
**For:** Engineers wrapping internal tooling as Claude-native skills.

## What you'll need

Pre-register or install before you start.

- **[Claude Pro or Team](https://www.anthropic.com/pricing)** _(Claude Pro $20/mo, Team $30/seat/mo)_ — Required to load Skills on claude.ai. Team unlocks shared skills across the workspace.
- **[Anthropic Skills documentation](https://docs.claude.com/en/docs/agents-and-tools/agent-skills/overview)** _(Free)_ — Official spec for SKILL.md, resources, and scripts.
- **[Claude Code (recommended)](https://www.anthropic.com/claude-code)** _(Free CLI, metered Anthropic API usage)_ — Local terminal client that loads skills from a folder, fastest iteration loop.
- **[Anthropic API](https://docs.claude.com/en/api/overview)** _(Metered (Sonnet ~$3/M input tokens, Opus ~$15/M))_ — Required if you want to expose the skill via your own backend instead of claude.ai.
- **[GitHub repo](https://github.com/)** _(Free)_ — Host the skill folder so teammates can install via a registry or git clone.

---

# The template · 9 steps

## Step 1: Pick the one capability the Skill adds

_Estimated time: 30-60 min_

Skills aren't generalists. The best skills do one thing the base model can't: a domain-specific lint, a templated report generation, a wrapper around your internal API, a workflow with a sandbox script. If you can't write the capability in 1 sentence ('this skill does X'), the trigger description will be vague and Claude won't load it on the right queries.

### Tasks

- [ ] Write the 1-sentence capability ('this skill does X')
- [ ] List 5 example user queries that should load the skill
- [ ] List 5 example queries that should NOT load it (false-positive territory)
- [ ] Decide: pure-prompt skill, resource-augmented, or with sandbox scripts

### Pointers

- **[Official]** [Anthropic Skills overview](https://docs.claude.com/en/docs/agents-and-tools/agent-skills/overview)
- **[Official]** [Skills design principles](https://docs.claude.com/en/docs/agents-and-tools/agent-skills/best-practices)

> [!CAUTION]
> **Gotchas**
>
> - A 'helpful general assistant' skill never loads, the model already does that. Skills must add a specific capability.
> - If your example queries overlap with the base model's strengths (write a poem, explain a concept), the skill is redundant.

## Step 2: Scaffold the skill folder structure

_Estimated time: 20 min_

A skill is a folder named after the skill, containing SKILL.md as the entry point and any resources or scripts. Claude Code looks in ~/.claude/skills/ and your project's .claude/skills/. claude.ai (with Skills enabled) reads from the skills you upload. Keep the folder shallow: deep nesting confuses the loader.

### Tasks

- [ ] Pick a kebab-case skill name (matches the folder name)
- [ ] Create the folder: ~/.claude/skills/your-skill-name/
- [ ] Create SKILL.md (front-matter + body, the entry point)
- [ ] Create resources/ for static files the skill references
- [ ] Create scripts/ for any sandbox scripts
- [ ] Add a README.md for humans (separate from SKILL.md, which is for Claude)

### Pointers

- **[Official]** [Skill folder layout](https://docs.claude.com/en/docs/agents-and-tools/agent-skills/getting-started)
- **[Code]** [Anthropic Skills examples on GitHub](https://github.com/anthropics/skills) — Real skill folders maintained by Anthropic, copy-paste a structure to start from.

> [!CAUTION]
> **Gotchas**
>
> - Skill names must be kebab-case (no spaces, no underscores, no capitals). The loader rejects camelCase.
> - The folder name MUST match the front-matter `name` field in SKILL.md. Mismatch = the skill silently fails to load.

## Step 3: Write SKILL.md: the trigger description is everything

_Estimated time: 1-2 hr_

SKILL.md has YAML front-matter (name, description) and a markdown body. The `description` is the most important field in the entire skill, it's what Claude reads to decide whether to load the skill on a given query. A vague description means the skill loads too rarely or too often. Treat the description like a search-engine query: dense with the keywords your target users actually use.

### Tasks

- [ ] Write the YAML front-matter: name (kebab-case, matches folder), description (dense, keyword-rich)
- [ ] Description: lead with the trigger phrasing ('Use when the user asks about X')
- [ ] Body: the canonical instructions Claude follows when the skill loads
- [ ] Reference resources by relative path (the loader resolves them)
- [ ] Reference scripts by relative path + describe what each script does

### Pointers

- **[Official]** [SKILL.md format reference](https://docs.claude.com/en/docs/agents-and-tools/agent-skills/reference)
- **[Official]** [Tuning skill descriptions](https://docs.claude.com/en/docs/agents-and-tools/agent-skills/best-practices)

> [!CAUTION]
> **Gotchas**
>
> - The description has a soft cap of ~500 chars before it gets truncated in the model's context. Front-load the trigger phrasing.
> - Don't write the description in marketing voice. Write it in the phrasing your users actually use.
> - If the description starts with 'A skill that...', rewrite it to start with 'Use when the user...'. Trigger > description.

### Agent prompt for this step

```text
Read the spec from the Brief.

Draft SKILL.md with this structure:

```
---
name: <kebab-case-name>
description: <dense, 1-3 sentence trigger description, lead with "Use when..." or "Use this skill when the user...">
---

# <Skill name in title case>

<1-paragraph what the skill does>

## When to use

<bullets matching the description, in user-query phrasing>

## Resources

<references to resources/* with 1-line descriptions>

## Scripts

<references to scripts/* with 1-line descriptions>

## Workflow

<step-by-step the model follows when this skill is loaded>
```

Output as a Brief section titled "SKILL.md v1". Don't run evals yet, that's the next step.
```

## Step 4: Add resources: templates, schemas, reference docs

_Estimated time: 1-3 hr_

Resources are static files the skill references. The model reads them on demand, not on skill load, so big resources don't blow your context window. Good resource candidates: a JSON schema, a markdown template, a glossary, a workflow diagram. Don't put logic in resources, that's what scripts are for.

### Tasks

- [ ] Identify the static content the skill needs that doesn't fit in SKILL.md
- [ ] Save each resource as a flat file under resources/ (md, json, yaml, csv work best)
- [ ] In SKILL.md, reference each resource by path with a 1-line description
- [ ] Test: load the skill, ask a query that needs a resource, verify the model fetches it

### Pointers

- **[Official]** [Resources in skills](https://docs.claude.com/en/docs/agents-and-tools/agent-skills/resources)

> [!CAUTION]
> **Gotchas**
>
> - Resources are read on demand, not on load. The model only reads them if SKILL.md references them by path.
> - PDFs and images work but cost more tokens than markdown. Convert to text where possible.
> - Keep resources flat. resources/templates/email/foo.md is harder for the model to find than resources/email-template.md.

## Step 5: Add scripts: sandbox code the skill executes

_Estimated time: 2-6 hr (most of it making the script idempotent + handling errors)_

Scripts let the skill run real code: a Python script that processes a user-uploaded CSV, a Node script that hits an internal API, a shell script that runs a build. Claude executes scripts in a sandboxed environment, no network unless explicitly granted, no host filesystem access. Treat scripts as the 'agent superpower' part of a skill: the part that does work, not just describes work.

### Tasks

- [ ] Identify operations that should be code, not prose (data processing, API calls, file generation)
- [ ] Write each operation as a self-contained script under scripts/
- [ ] Each script: takes args from stdin or argv, writes output to stdout, errors to stderr, exits non-zero on failure
- [ ] In SKILL.md, reference each script with: when to call it, what args it expects, what output to expect
- [ ] Test: load the skill, ask a query that should trigger the script, confirm execution

### Pointers

- **[Official]** [Scripts and code execution in skills](https://docs.claude.com/en/docs/agents-and-tools/agent-skills/scripts)
- **[Official]** [Code execution sandbox](https://docs.claude.com/en/docs/build-with-claude/code-execution)

> [!CAUTION]
> **Gotchas**
>
> - Scripts run in a sandbox without internet access by default. Document any required network access explicitly.
> - Scripts that hang (waiting for stdin that never comes, infinite loops) wedge the agent. Always exit cleanly.
> - Sandboxes don't share state across script calls. Don't write to /tmp expecting it to persist, the next call starts fresh.

### Agent prompt for this step

```text
Read the user's existing tooling (their codebase, scripts, or internal CLI).

Identify which operations should become scripts in this skill. For each:
1. Name (kebab-case, ends in .py / .js / .sh)
2. Args (what the model passes in)
3. Stdout (what the model gets back)
4. Failure modes (what stderr looks like + exit code)

Then draft each script as a self-contained file. Constraints:
- Pure stdlib where possible (Python: stdlib + no extra installs).
- No network unless absolutely necessary, document the network dependency in the SKILL.md.
- Idempotent. Running the script twice should produce the same result.

Output the scripts as a Brief section titled "Scripts v1".
```

## Step 6: Eval the trigger description against real queries

_Estimated time: 2-4 hr per iteration, 2-5 iterations typical_

The trigger description is right when the skill loads on the queries it should and doesn't load on the queries it shouldn't. Build a 20-30 query eval set: half should trigger the skill, half should not. Run it, log results, tweak the description, re-run. This is the loop that separates skills that ship from skills that get uninstalled after one frustrating session.

### Tasks

- [ ] Write 15-20 queries that SHOULD trigger the skill (positive set)
- [ ] Write 15-20 queries that should NOT trigger it (negative set, must be similar enough to be confusable)
- [ ] Run each query in claude.ai or Claude Code, log: did the skill load?
- [ ] Compute hit rate (positive set) + false-positive rate (negative set)
- [ ] Target: 90%+ hit rate, <10% false-positive rate
- [ ] Tweak description, re-run, log to the Eval log surface

### Pointers

- **[Official]** [Evaluating skill triggers](https://docs.claude.com/en/docs/agents-and-tools/agent-skills/best-practices)

> [!CAUTION]
> **Gotchas**
>
> - Hit rate over 95% on the positive set is suspicious. It usually means the negative set is too easy (queries the skill obviously shouldn't fire on).
> - Run evals in a fresh conversation each time. Prior conversation context biases the load decision.
> - Skill loading is non-deterministic at the edges. A 92% hit rate is the realistic ceiling for most skills.

### Agent prompt for this step

```text
Run an eval pass on the current SKILL.md.

For each query in the Eval log surface:
1. Send the query to Claude with the skill loaded.
2. Check whether the skill triggered (look for the skill's distinctive output or behavior).
3. Update the row: actual_loaded (true/false), notes (what the model did instead, if it didn't).

After all 30 queries, compute:
- Hit rate on positive set
- False-positive rate on negative set

If hit rate < 90%: the trigger description is too narrow, suggest 2-3 expansions of the description and which positive queries each one would catch.
If false-positive rate > 10%: the trigger description is too broad, suggest 2-3 tightenings and which negative queries each one would reject.

Output the analysis as a Brief section titled "Eval results v<n>".
```

## Step 7: Package the skill: README, version, install instructions

_Estimated time: 1-2 hr_

A skill ships when a teammate can install it in one step. That means a clean README.md, a SemVer version in the front-matter, and either a public registry entry or a git URL they can clone into their ~/.claude/skills/ folder. Don't ship a skill folder full of half-finished scripts and 'TODO' comments.

### Tasks

- [ ] Add `version: 1.0.0` to SKILL.md front-matter
- [ ] Write a README.md targeted at humans: what the skill does, install steps, example invocations
- [ ] Add a CHANGELOG.md if the skill will see multiple versions
- [ ] Tag the git repo with v1.0.0
- [ ] If using a registry: publish the skill (per the registry's instructions)

### Pointers

- **[Official]** [Skill versioning + packaging](https://docs.claude.com/en/docs/agents-and-tools/agent-skills/distribution)

> [!CAUTION]
> **Gotchas**
>
> - Skills with no version field default to '0.0.0', which makes upgrade tracking impossible. Always set a version.
> - Don't ship skills with hardcoded paths like /Users/yourname/. They break on install. Use relative paths or env vars.

## Step 8: Distribute: registry entry, share link, or org install

_Estimated time: 1-2 hr_

Distribution depends on your audience. Public skill: publish to a registry like the official Anthropic skills repo or claudehub.dev. Internal team skill: clone into a shared dotfiles repo, install via a one-line script. claude.ai Team skill: upload to the Team workspace from Settings -> Skills, every member auto-installs. Pick the channel that matches who needs the skill.

### Tasks

- [ ] Pick distribution: public registry, internal git repo, or claude.ai Team
- [ ] If public: open a PR to the registry with the skill folder + README
- [ ] If internal: add a one-line install command to your team's onboarding doc
- [ ] If Team: Settings -> Skills -> Upload, confirm members see it
- [ ] Send a 1-paragraph announcement to the team or community: what it does + 1 example query

### Pointers

- **[Code]** [Anthropic public skills repo](https://github.com/anthropics/skills)
- **[Official]** [Sharing skills in claude.ai Team](https://docs.claude.com/en/docs/agents-and-tools/agent-skills/sharing)

> [!CAUTION]
> **Gotchas**
>
> - claude.ai Team skill uploads have a 10 MB limit (the entire skill folder zipped). Big resources may need to be fetched at runtime instead of bundled.
> - Public registries usually require the skill to follow a license (MIT or Apache-2.0). Add a LICENSE file before submitting.

## Step 9: Iterate: monitor usage, log regressions, ship v1.1

_Estimated time: Ongoing, 1-2 hr/month_

Skills age. Every base-model upgrade shifts how the trigger description is interpreted. Run the eval set monthly. When false-positive rate climbs, tighten the description. When hit rate drops, broaden it. Keep the eval log as the canonical history of what worked, you'll forget which trigger phrasing won by month 4.

### Tasks

- [ ] Re-run the eval set on each Claude model upgrade announcement
- [ ] Compare results to the previous run, log regressions to Eval log
- [ ] Solicit user reports: 'when did the skill misfire?' or 'when did it not load when you expected?'
- [ ] Ship a v1.1 with description tweaks + new resources for queries that emerged in real usage
- [ ] Bump version, update CHANGELOG, push tag

### Pointers

- **[Official]** [Anthropic model release notes](https://docs.claude.com/en/release-notes/overview)

> [!CAUTION]
> **Gotchas**
>
> - Anthropic model snapshots can shift skill loading behavior even within the same model name. Pin to a snapshot in production if your skill is mission-critical.
> - Don't cargo-cult description tweaks from one bad query, run the full eval set first to confirm the change doesn't regress positives.

---

## Hand the template to your agent

Paste the prompt below into your agent's permanent system prompt so the agent reads, writes, and maintains this workspace as you work through the steps.

```text
You are an agent on the "Build a Claude Skill" playbook workspace at your-org/build-a-claude-skill.

Your role: maintain the four surfaces (Steps, Pointers, Brief, Eval log) as the user works through the 9-step playbook.

Cadence:
- When the user marks a step Done, append a line to the Brief summarising what shipped at that gate.
- When the user runs an eval query against the skill, append the result as a row in Eval log (query, expected_loaded, actual_loaded, notes).
- When the SKILL.md trigger description changes, snapshot the previous version in the Brief so we can compare which trigger produces the best load rate.

First MCP tool calls:
1. list_surfaces(workspace_slug="build-a-claude-skill")
2. list_rows(workspace_slug="build-a-claude-skill", surface_slug="steps")
3. get_doc(workspace_slug="build-a-claude-skill", surface_slug="brief")

Do NOT modify the canonical step titles in the Steps table. You can append substeps as new rows beneath them.
```

---

## FAQ

### What's the difference between a Claude Skill and a tool / function call?

Tools (function calls) are runtime callables the model invokes mid-conversation. Skills are static folders the model loads on demand based on the trigger description. A skill can include tools/scripts inside it, but the skill itself is the package: SKILL.md + resources + scripts. Think 'tool' = one function, 'skill' = a domain-expert mode the model enters.

### Do I need the API to ship a skill, or can I use claude.ai?

claude.ai (Pro or Team plan) supports skills directly: upload a zip from Settings -> Skills. Claude Code (the local terminal client) auto-loads skills from ~/.claude/skills/. The API supports skills as part of agent loops if you build the loader yourself. For most ship-a-skill cases, claude.ai or Claude Code is enough.

### Why doesn't my skill load when I expect it to?

9 times out of 10, the trigger description is too narrow or too vague. Run the eval pass: send 20 queries that should trigger the skill, count how many actually do. If hit rate is below 90%, expand the description with the phrasing your users actually use. The trigger description is the single most-tuned part of a skill.

### Can my skill call my internal API?

Yes, via a script. Scripts run in a sandbox with optional network access (you have to explicitly grant it). For an internal API, your script needs an auth token, which you'd inject via env vars on the host that runs Claude Code. claude.ai sandboxes don't have access to your internal network, only public APIs.

### How do I version a skill?

Add `version: <semver>` to the SKILL.md front-matter. Bump on every release, tag the git repo with the same version. Loaders use the version to decide whether to upgrade an installed skill. Without a version, the loader treats every install as 0.0.0 and upgrades silently fail.

### Can my AI agents help build the skill?

Yes, this playbook ships agent prompts for the slow parts: drafting SKILL.md from a 1-line spec, generating scripts from your existing tooling, running the eval pass against the trigger description, and analyzing eval results to suggest description tweaks. The Eval log surface is the canonical history of every trigger version.

