/prompt-eval-runbook - Prompt Eval Runbook Slash Command
Slash command runbook for designing and running prompt evaluations: define tasks, success criteria, golden outputs, regression checks, and privacy-safe reporting using Anthropic test-and-evaluate guidance.
Open the source and read safety notes before installing.
Safety notes
- Eval fixtures must not contain live credentials or customer PII.
- Treat model outputs as draft until human reviewers sign off on regressions.
Privacy notes
- Redact proprietary prompts before exporting eval results outside the team.
Prerequisites
- Representative user tasks and expected outcomes for the prompt under test.
- Staging environment without production secrets in eval fixtures.
Schema details
- Install type
- cli
- Troubleshooting
- No
- Command syntax
- /prompt-eval-runbook <feature-or-prompt-name>
Full copyable content
/prompt-eval-runbook <feature-or-prompt-name>About this resource
The /prompt-eval-runbook command turns Anthropic test-and-evaluate guidance into a
repeatable Claude Code session checklist for prompt or skill changes.
Usage
/prompt-eval-runbook <feature-or-prompt-name>
What it does
- Define tasks. List 5–10 representative user tasks with clear success criteria.
- Draft fixtures. Create minimal inputs without production secrets.
- Establish baselines. Capture current outputs for comparison after prompt edits.
- Run regression pass. Re-run tasks after changes; flag quality, safety, or format regressions.
- Score results. Use pass/partial/fail per task with one-line evidence.
- Document rollback. Record prior prompt version or git ref to revert if eval fails.
- Publish summary. Produce a privacy-safe eval report for reviewers.
Output format
- Task list and success criteria
- Baseline vs candidate comparison table
- Regression findings with severity
- Ship/no-ship recommendation
- Rollback reference
Requirements
- Access to staging Claude Code or Agent SDK environment for running tasks.
- Reviewer approval before promoting prompt changes to production users.
Source citations
Add this badge to your README
Show that /prompt-eval-runbook - Prompt Eval Runbook Slash Command is listed on HeyClaude. Paste this Markdown into your README — it renders the badge and links back to this page.
[](https://heyclau.de/entry/commands/prompt-eval-runbook)How it compares
/prompt-eval-runbook - Prompt Eval Runbook Slash Command side by side with 3 alternatives on trust, install, platform support, and disclosed safety notes — all from reviewed registry metadata.
| Field | /prompt-eval-runbook - Prompt Eval Runbook Slash Command Slash command runbook for designing and running prompt evaluations: define tasks, success criteria, golden outputs, regression checks, and privacy-safe reporting using Anthropic test-and-evaluate guidance. Open dossier | /targeted-test-generation - Targeted Test Generation Runbook Community slash command runbook for adding minimal automated tests around a changed module: inspect the git diff, mirror repository test conventions, and draft focused unit or integration tests using Anthropic develop-tests guidance. Open dossier | /documentation-refresh - Documentation Refresh Runbook Community slash command runbook to refresh stale project documentation after code changes: use git history to find affected docs, compare README commands to package scripts, and flag broken internal links before opening a docs PR. Open dossier | /frontend-visual-qa - Chrome Design Verification Runbook Community slash command runbook for frontend visual QA using documented Claude Code Chrome integration workflows: enable /chrome, open a local page, read console messages, and follow the design verification checklist from the Chrome integration guide. Open dossier |
|---|---|---|---|---|
| Trust | ||||
| Install risk | Review first | Review first | Review first | Review first |
| Notes | Safety ✓ Privacy ✓ | Safety ✓ Privacy ✓ | Safety ✓ Privacy ✓ | Safety ✓ Privacy ✓ |
| Category | commands | commands | commands | commands |
| Source | source-backed | source-backed | source-backed | source-backed |
| Author | kiannidev | kiannidev | kiannidev | kiannidev |
| Added | 2026-06-16 | 2026-06-16 | 2026-06-16 | 2026-06-16 |
| Platforms | Claude Code | Claude Code | Claude Code | Claude Code |
| Source repo | — | — | — | — |
| Safety notes | ✓Eval fixtures must not contain live credentials or customer PII. Treat model outputs as draft until human reviewers sign off on regressions. | ✓Generated tests are drafts; run the project's test command locally before committing. Do not add network calls, production credentials, or destructive setup to proposed tests. | ✓Read-only git history inspection unless the operator approves doc edits. Validate git refs before interpolating them into shell commands. | ✓Chrome integration runs in a visible browser with your logged-in session; avoid production admin flows. Handle login pages and CAPTCHAs manually when the integration pauses. |
| Privacy notes | ✓Redact proprietary prompts before exporting eval results outside the team. | ✓Use synthetic fixtures only; do not copy production logs or customer records into tests. | ✓Commit messages and doc drafts enter model context; scrub internal-only details first. | ✓Console logs and screenshots may include staging data; redact before external sharing. |
| Prerequisites |
|
|
|
|
| Install | | | | |
| Config | — | — | — | — |
| Citations | ||||
| Claim | Unclaimed | Unclaimed | Unclaimed | Unclaimed |
Signals
Loading live community signals…
A short, calm digest of reviewed Claude resources. Unsubscribe any time.