Skip to main content
commandsSource-backedReview first Safety Privacy

/prompt-eval-runbook - Prompt Eval Runbook Slash Command

Slash command runbook for designing and running prompt evaluations: define tasks, success criteria, golden outputs, regression checks, and privacy-safe reporting using Anthropic test-and-evaluate guidance.

by kiannidev·added 2026-06-16·
HarnessClaude Code
Invocation:/prompt-eval-runbook <feature-or-prompt-name>
Review first review before installing

Open the source and read safety notes before installing.

Safety notes

  • Eval fixtures must not contain live credentials or customer PII.
  • Treat model outputs as draft until human reviewers sign off on regressions.

Privacy notes

  • Redact proprietary prompts before exporting eval results outside the team.

Prerequisites

  • Representative user tasks and expected outcomes for the prompt under test.
  • Staging environment without production secrets in eval fixtures.

Schema details

Install type
cli
Troubleshooting
No
Runtime and command metadata
Command syntax
/prompt-eval-runbook <feature-or-prompt-name>
Full copyable content
/prompt-eval-runbook <feature-or-prompt-name>

About this resource

The /prompt-eval-runbook command turns Anthropic test-and-evaluate guidance into a repeatable Claude Code session checklist for prompt or skill changes.

Usage

/prompt-eval-runbook <feature-or-prompt-name>

What it does

  1. Define tasks. List 5–10 representative user tasks with clear success criteria.
  2. Draft fixtures. Create minimal inputs without production secrets.
  3. Establish baselines. Capture current outputs for comparison after prompt edits.
  4. Run regression pass. Re-run tasks after changes; flag quality, safety, or format regressions.
  5. Score results. Use pass/partial/fail per task with one-line evidence.
  6. Document rollback. Record prior prompt version or git ref to revert if eval fails.
  7. Publish summary. Produce a privacy-safe eval report for reviewers.

Output format

  • Task list and success criteria
  • Baseline vs candidate comparison table
  • Regression findings with severity
  • Ship/no-ship recommendation
  • Rollback reference

Requirements

  • Access to staging Claude Code or Agent SDK environment for running tasks.
  • Reviewer approval before promoting prompt changes to production users.

Source citations

Add this badge to your README

Show that /prompt-eval-runbook - Prompt Eval Runbook Slash Command is listed on HeyClaude. Paste this Markdown into your README — it renders the badge and links back to this page.

Listed on HeyClaude
[![Listed on HeyClaude](https://heyclau.de/badge/commands/prompt-eval-runbook.svg)](https://heyclau.de/entry/commands/prompt-eval-runbook)

How it compares

/prompt-eval-runbook - Prompt Eval Runbook Slash Command side by side with 3 alternatives on trust, install, platform support, and disclosed safety notes — all from reviewed registry metadata.

Field/prompt-eval-runbook - Prompt Eval Runbook Slash Command

Slash command runbook for designing and running prompt evaluations: define tasks, success criteria, golden outputs, regression checks, and privacy-safe reporting using Anthropic test-and-evaluate guidance.

Open dossier
/targeted-test-generation - Targeted Test Generation Runbook

Community slash command runbook for adding minimal automated tests around a changed module: inspect the git diff, mirror repository test conventions, and draft focused unit or integration tests using Anthropic develop-tests guidance.

Open dossier
/documentation-refresh - Documentation Refresh Runbook

Community slash command runbook to refresh stale project documentation after code changes: use git history to find affected docs, compare README commands to package scripts, and flag broken internal links before opening a docs PR.

Open dossier
/frontend-visual-qa - Chrome Design Verification Runbook

Community slash command runbook for frontend visual QA using documented Claude Code Chrome integration workflows: enable /chrome, open a local page, read console messages, and follow the design verification checklist from the Chrome integration guide.

Open dossier
Trust
Install riskReview firstReview firstReview firstReview first
Notes Safety Privacy Safety Privacy Safety Privacy Safety Privacy
Categorycommandscommandscommandscommands
Sourcesource-backedsource-backedsource-backedsource-backed
Authorkiannidevkiannidevkiannidevkiannidev
Added2026-06-162026-06-162026-06-162026-06-16
Platforms
Claude Code
Claude Code
Claude Code
Claude Code
Source repo
Safety notesEval fixtures must not contain live credentials or customer PII. Treat model outputs as draft until human reviewers sign off on regressions.Generated tests are drafts; run the project's test command locally before committing. Do not add network calls, production credentials, or destructive setup to proposed tests.Read-only git history inspection unless the operator approves doc edits. Validate git refs before interpolating them into shell commands.Chrome integration runs in a visible browser with your logged-in session; avoid production admin flows. Handle login pages and CAPTCHAs manually when the integration pauses.
Privacy notesRedact proprietary prompts before exporting eval results outside the team.Use synthetic fixtures only; do not copy production logs or customer records into tests.Commit messages and doc drafts enter model context; scrub internal-only details first.Console logs and screenshots may include staging data; redact before external sharing.
Prerequisites
  • Representative user tasks and expected outcomes for the prompt under test.
  • Staging environment without production secrets in eval fixtures.
  • Git repository with an existing test runner configured in package scripts or docs.
  • A concrete code change or diff hunk to cover with new or updated tests.
  • Git repository with commits and documentation files such as README.md or docs/.
  • Lockfiles or package manifests when comparing documented install commands.
  • Claude Code 2.0.73+ and Claude in Chrome extension 1.0.36+ on Chrome or Edge.
  • Local dev server reachable from the operator browser session.
Install
/prompt-eval-runbook <feature-or-prompt-name>
/targeted-test-generation <file-or-symbol>
/documentation-refresh [since-ref]
/frontend-visual-qa <route-or-host>
Config
Citations
ClaimUnclaimedUnclaimedUnclaimedUnclaimed

Signals

Loading live community signals…

More like this, weekly

A short, calm digest of reviewed Claude resources. Unsubscribe any time.