/Skills

Claude Code skills I built because I kept needing them.

Q: What is the difference between pressure-test and improve?

They are two halves of one pipeline. pressure-test finds the problems and prescribes fixes but does not touch your work. improve does the rewrite, and an independent check catches any meaning-drift before you see it. Run pressure-test first, then hand its recommendations to improve.

Q: Do these skills require PowerShell?

Only pressure-test's enforced mode uses PowerShell 7 (pwsh), which runs on macOS and Linux as well as Windows. Without it, pressure-test still runs: it falls back to an inline pass and tells you the enforced gate did not run. The other three skills need no shell.

Q: Are the skills free to use?

Yes. They are open source under the MIT license: use, fork, and adapt them freely. A link back is appreciated, not required.

Five skills for Claude Code—Anthropic’s command-line coding agent—that do the thinking-heavy work I lean on daily: stress-test a piece of real work, make it leaner without breaking it, rewrite it stronger, set an autonomous-loop goal that actually stops, and fact-check the AI advice that goes stale a week after someone posts it. Every one runs on the same stubborn rule: don’t let the model grade its own homework. The heavy ones spin up separate processes to do it, so all that critique runs off to the side and never crowds your own session. Open source under the MIT license—take them, fork them, make them yours.

The pairing I reach for most: pressure-test → improve

These two are opposite halves of one pipeline, and the reason this set exists. The same stubborn rule runs through all five skills: don’t let the model grade its own homework. A model that just helped you build something will defend its own reasoning the moment you ask it to critique that work—so these hand the feedback to a fresh process that never saw how the sausage got made.

Finds & prescribes

/pressure-test

Spins up a panel of fresh-context critics, each handed only your artifact and what it’s supposed to do. None of them watched you build it, so they critique the thing instead of the story behind it—strengths, weaknesses, the assumptions you smuggled in without noticing—then hand back a specific fix for each one. It checks prior art too: has someone already hit this and solved it? It doesn’t touch your work.

Executes

/improve

A fresh critic finds what’s weak, the work gets revised, repeat until it stops improving—then an independent verifier reads the rewrite against the original and flags any spot where the meaning shifted or your voice got sanded down to house style. Hand it pressure-test’s findings and it folds them in as the first pass instead of re-deriving them.

Run them in order: pressure-test to find the problems, improve to fix them—and you don’t have to stop at one round. Pressure-test the improved version, feed the new findings back to improve, and keep alternating for as many turns as the work earns. They’re built for that loop, and built for Claude Code—where the fresh-eyes processes and enforced gates actually run. They’ll work on the claude.ai web app too, in a reduced single-context form, and each tells you up front when it can’t run the full path.

All five skills

Each ships with its operating SKILL.md; the two flagship skills also include a plain-language EXPLAINER.md that walks through what they are and why they’re built this way—no jargon required.

pressure-test /pressure-test

Stress-tests real work—files, frameworks, contracts, prompts, public content, decisions—and hands back a specific fix for each weakness. It checks prior art along the way, so you’re not reinventing a fix someone already shipped. In Claude Code a deterministic script does the parts you can’t leave to a model’s memory: it auto-escalates anything security-, credential-, or personal-data-shaped to a deeper, web-firewalled pass, fans out fresh-context critics with locked-down output, runs an independent verifier, and emits a checked findings list. Finds and prescribes; built to pair with improve, which executes the fixes.

View on GitHub →
improve /improve

Takes an artifact, an idea, a problem statement, or a half-formed chain of reasoning and hands back a genuinely stronger version, plus a straight account of what changed and why. Bounded so it can’t loop forever—a goal you confirm up front, the evaluator (never the reviser) owning the call to stop, and a hard pass cap. It works within your voice instead of flattening it, and writes a .orig copy first so the original is always one step back. Hand it pressure-test’s findings and it folds them straight in as the first pass. Produces the result.

View on GitHub →
optimize-efficiency /optimize-efficiency

Reviews a target—code, a script, a process, a workflow, a prompt, a config, a plan—for waste across four axes: resource use, code, speed, and token usage. Every candidate fix runs through a hard no-regression gate first, because efficiency bought with correctness, safety, or security isn’t a win—it’s a regression in a costume. Clean wins get ranked by leverage; real trade-offs get surfaced with both sides named; corner-cutting gets declined, and listed as declined so you can see it was considered and rejected on purpose. Location-anchored, evidence-bearing. Recommends; pair it with improve to execute.

View on GitHub →
set-goal /set-goal

Writes a finish line for Claude Code’s built-in /goal autonomous loop, then hands it back ready to paste. The model that decides “done” sees only the conversation—not your files—so the condition has to be provable from what Claude actually surfaces. This makes that finish line something the evaluator can check, so the loop stops instead of running all night.

View on GitHub →
evaluate-ai-merits auto / direct

Fact-checks the content you just read about Claude, Anthropic products, Claude Code, MCP (Model Context Protocol) servers, AI agents, or prompt engineering. It routes claims back to primary sources and flags the parts that date fastest—a built-in feature confused with a community add-on, a model-version tip passed off as universal, one surface’s features pinned on another, a preview sold as shipped. AI tooling changes weekly; this catches the advice that quietly went stale.

View on GitHub →

Why they’re built this way

The design choices are the same across the set, and they all come back to one refusal: don’t let the model grade its own homework.

Fresh context, not self-review

A model that just helped build something carries that work’s reasoning into any critique of it—it restates and defends instead of diagnosing. So pressure-test and improve push the critique to separate processes handed only the artifact and the goal. Outside eyes, in seconds.
The heavy lifting stays out of your context

That same split has a second payoff: the critics, the verifier, and the prior-art search all run outside your main conversation, so their back-and-forth never lands in your window. You get a short, checked result handed back—not a transcript of five critics thinking out loud. Pressure-test a long document and your session barely grows; run the same work inline and it buries the thread. It’s strongest in pressure-test, where the critics are genuinely separate processes and only the findings come back—one of the real reasons these are built for Claude Code rather than a single chat window.
Enforcement where forgetting is expensive

A security document reviewed at shallow depth reads exactly like a clean one—the failure is invisible. So pressure-test’s sensitivity escalation, panel size, quote-existence check, and web firewall are owned by a deterministic script. The model does judgment inside the gates; it doesn’t get to skip them. What’s enforced versus what’s left to judgment is written down honestly—the skill never claims more rigor than it has.
Loops that stop

improve’s loop is anchored to a goal you confirm up front, the evaluator (never the reviser) owns the call to stop, and there’s a hard pass cap—so it can’t polish forever. set-goal exists to make the outer /goal loop stop too, by forcing a finish line the evaluator can verify from the conversation alone.
Honest degradation

On a surface with no shell or subagents—the claude.ai web app, say—the enforced machinery can’t run. Each skill notices, runs the same logic in one context, and leads with the disclosure instead of pretending it ran the full path. Falsely reassuring you is the exact failure these were built to prevent.

Install

Claude Code loads skills from ~/.claude/skills/ (personal, every project) or a project’s .claude/skills/ folder. Clone the repo and drop in the ones you want.

git clone https://github.com/robmcquade/claude-skills.git
cd claude-skills
mkdir -p ~/.claude/skills
cp -r pressure-test improve optimize-efficiency set-goal evaluate-ai-merits ~/.claude/skills/

Start a new Claude Code session and invoke with /pressure-test, /improve, /optimize-efficiency, or /set-goal. pressure-test’s enforced mode uses PowerShell 7 (pwsh), which runs on macOS and Linux too—without it, the skill falls back to an inline pass and tells you so. Full instructions, requirements, and a note on the security test fixtures are in the repo README.

Get the skills on GitHub

Frequently asked questions

What are Claude Code skills?

Skills are reusable instruction packages that extend Claude Code, Anthropic’s command-line coding agent. Each is a directory with a SKILL.md file (plus any bundled scripts and schemas) that Claude loads when it’s relevant. You invoke most of these with a slash command, like /pressure-test.

How do I install these skills?

Clone the repo and copy the skill directories into your personal skills folder at ~/.claude/skills/, or a project’s .claude/skills/ folder. Start a new Claude Code session and they show up as slash commands.

What’s the difference between pressure-test and improve?

They’re two halves of one pipeline. pressure-test finds the problems and prescribes fixes but doesn’t touch your work. improve does the rewrite, and an independent check catches any meaning-drift before you see it. Run pressure-test first, then hand its recommendations to improve.

Do these skills require PowerShell?

Only pressure-test’s enforced mode uses PowerShell 7 (pwsh), which runs on macOS and Linux as well as Windows. Without it, pressure-test still runs—it falls back to an inline pass and tells you the enforced gate didn’t run. The other three skills need no shell.

Are the skills free to use?

Yes. They’re open source under the MIT license—use, fork, and adapt them freely. A link back is appreciated, not required.

Feedback & contact

These got sharper every time someone poked a hole in them, so if you’ve got a genuinely good catch or a sharper idea, I want it. Open an issue for a bug or a concrete suggestion, or start a discussion for anything more open-ended—both live on the GitHub repo. And if you want to reach me about something else entirely, the door’s open.

Send me a message