Review AI-Generated Code Before Merge

A source-backed review workflow for pull requests that include AI-generated code. Treat generated diffs as untrusted implementation work, verify behavior in CI, inspect security-sensitive paths first, and merge only after a reviewer-owned checklist passes.

by MkDev11·added 2026-06-04·

Claude Code

HarnessClaude Code

Command center

Source

Review first

Review safety and privacy notes before installing or copying commands.

Safety notes Privacy notes

Install & copy

## TL;DR

Review AI-generated code like any other untrusted implementation: inspect the
diff, rebuild it, test the changed behavior, check security-sensitive paths, and
only merge after a reviewer can explain why the code is correct.

The useful mental model is simple: **AI can propose code, but the reviewer owns
the merge decision.** The pull request should contain enough evidence for a
human maintainer to verify the change without relying on generated confidence.
Even when an AI reviewer provides comments or suggested changes, validate that
feedback carefully and supplement it with human review before merging.

## Prerequisites & Requirements

- [ ] {"task": "Isolated checkout", "description": "You can rebuild the branch in a disposable sandbox or container and reproduce the test results"}
- [ ] {"task": "Project test commands", "description": "You know the focused tests, linters, and type checks for the touched code"}
- [ ] {"task": "Security scanning", "description": "Code scanning, secret scanning, dependency review, or local equivalents are available for risky diffs"}
- [ ] {"task": "Reviewer authority", "description": "You can request changes when the PR lacks tests, source links, or a clear rollback path"}

## Core Concepts Explained

### AI output is an implementation, not evidence

Generated code may be useful, but the explanation around it is not proof that
the change is correct. Treat claims such as "all edge cases are handled" or "no
security impact" as review prompts. Ask for the command, test, trace, design
reference, or code path that proves the claim.

### Review behavior before style

Generated diffs often look polished. Start with behavior: what input changed,
what output changed, which permissions changed, what data moves, and what can
fail. Style cleanups can wait until the reviewer understands the actual runtime
effect.

### Separate generated-size risk from feature risk

A small generated diff can change authorization logic. A large generated diff
can be mostly generated fixtures. Review risk by the code path and blast radius,
not by whether the author used an AI tool.

## Step-by-Step Implementation Guide

1. **Freeze the PR scope.** Ask the author to state which files were generated,
   which were manually edited, and what user-visible behavior should change. If
   the PR mixes unrelated refactors, generated rewrites, and feature work,
   request a smaller branch before reviewing.

2. **Preflight dependency and tooling changes.** Before running install, build,
   or test commands from the branch, inspect changes to package manifests,
   lockfiles, package-manager configuration, install scripts, container images,
   GitHub Actions, and third-party SDKs. Confirm why each new dependency or
   lifecycle script is required.

3. **Rebuild in an isolated environment.** Pull the branch into a disposable
   sandbox, container, or isolated development environment. Install
   dependencies using the repository's documented workflow with package-manager
   lifecycle scripts disabled unless the changed scripts and packages have been
   reviewed and approved, then run the focused checks for the touched package.
   Do not rely only on screenshots, generated summaries, or copied terminal
   output in the PR body.

4. **Review security-sensitive paths first.** Start with authentication,
   authorization, secrets, payments, migrations, data deletion, networking,
   deserialization, sandbox escapes, release automation, and permission changes.
   These paths get review priority because a plausible-looking generated patch
   can still change a trust boundary.

5. **Check secrets and dependency changes.** Run secret scanning or a local
   equivalent before merge. Re-check dependency diffs after installing so
   generated or refreshed lockfiles still match the reviewed dependency set.

6. **Read the diff in small slices.** Review one behavior path at a time. For
   each slice, ask: what invariant should stay true, what test proves it, what
   user data is touched, and what happens when the new code fails?

7. **Require focused tests for changed behavior.** Unit tests are usually enough
   for pure functions. Integration tests are better for permission checks,
   persistence, API clients, migrations, and concurrency. If a test would be too
   expensive, ask for a documented manual verification command.

8. **Verify AI-written comments and docs.** Generated comments can drift from the
   code they describe. Check that public docs, migration notes, and inline
   comments match the implementation and do not promise unsupported behavior.

9. **Review agent-created PRs as production code.** If an AI coding agent opened
   the pull request, inspect the final diff thoroughly before merging. Confirm
   that the agent's summary matches the files, that requested review comments
   were actually addressed, and that no unrelated generated changes slipped in.

10. **Merge only after the reviewer-owned checklist passes.** The reviewer should
   be able to summarize the change, name the highest-risk file, point to the
   verification evidence, and explain the rollback plan.

## Reviewing With Claude Code

When the diff was produced by Claude Code, you can use the same tool to add an
independent review step. The key idea from the Claude Code best-practices guide
is that a fresh context reviews better than the one that wrote the change: a
reviewer running in a separate [subagent](https://code.claude.com/docs/en/sub-agents)
context "sees only the diff and the criteria you give it, not the reasoning that
produced the change, so it evaluates the result on its own terms." The session
that implemented the work receives the gaps directly and can fix and re-review
without you copying findings between windows.

Claude Code ships a bundled `/code-review` skill that "reviews the current diff
for bugs in a fresh subagent and returns findings to the session." Run it before
treating a change as done:

```text
/code-review
```

To check a diff against your own plan or requirements instead of generic bug
hunting, write the review prompt yourself and tell Claude to delegate it. Name
the work to check, what to check it against, and what counts as a finding:

```text
Use a subagent to review the rate limiter diff against PLAN.md. Check that
every requirement is implemented, the listed edge cases have tests, and
nothing outside the task's scope changed. Report gaps, not style preferences.
```

For a reusable reviewer, define a subagent in `.claude/agents/`. Subagents are
Markdown files with YAML frontmatter where only `name` and `description` are
required; the body becomes the subagent's system prompt, and `tools` restricts
what it can do (omit it to inherit all tools). The following restricts the
reviewer to read-only tools so it cannot edit files while reviewing:

```markdown
---
name: code-reviewer
description: Reviews code for quality and best practices
tools: Read, Glob, Grep
model: sonnet
---

You are a senior reviewer. Review the current diff for:
- Correctness against the stated requirements
- Security-sensitive paths: auth, secrets, data deletion, networking
- Missing tests for changed behavior

Report only gaps that affect correctness or the stated requirements.
Provide specific line references. Do not edit files.
```

Then invoke it explicitly, for example: `Use a subagent to review this code for
security issues.` Because subagents run in their own context window with their
own allowed tools, the review does not clutter the implementing conversation.

You can also run review and verification non-interactively, which is how Claude
Code integrates into CI pipelines and pre-commit hooks. The `claude -p` flag
runs a single prompt without a session, and `--output-format json` returns
structured output a script can parse:

```bash
# Run a focused review on the current diff in CI
claude -p "Review the staged diff for security and missing tests. Report gaps only." \
  --output-format json
```

The best-practices guide is explicit that a reviewer asked to find gaps will
usually report some even when the work is sound. Tell the reviewer to flag only
gaps that affect correctness or the stated requirements, and treat the rest as
optional, to avoid over-engineering, defensive code, and tests for cases that
cannot happen.

### Review Dimensions Reference

Use these dimensions when prompting a Claude Code review subagent or running
`/code-review`. They map the source guidance onto a concrete checklist.

| Dimension | What to check | How Claude Code helps |
| --- | --- | --- |
| Independent context | Reviewer did not write the change | Run the review in a fresh subagent or a separate session so the model is not biased toward code it just wrote |
| Correctness vs. plan | Every requirement implemented; nothing out of scope changed | Prompt a subagent to review the diff against `PLAN.md` and report gaps |
| Security-sensitive paths | Auth, secrets, data deletion, networking, deserialization | Use a read-only `code-reviewer` subagent (`tools: Read, Glob, Grep`) focused on these paths |
| Verification evidence | Tests, build exit code, or a script that produces a pass/fail | "Have Claude show evidence rather than asserting success": run the check and read the result |
| Test coverage for changes | Focused tests for the changed behavior exist | Ask the reviewer to flag changed behavior that lacks a test |
| Finding discipline | Findings are real gaps, not style nits | Tell the reviewer to report only gaps affecting correctness or requirements |
| Automated gate | Review runs without a human present | `claude -p "..." --output-format json` in CI or a pre-commit hook |

## Reviewer Checklist

- [ ] {"task": "Scope is narrow", "description": "The PR changes one behavior or one coherent workflow"}
- [ ] {"task": "Generated files are identified", "description": "The author says which parts came from an AI coding tool"}
- [ ] {"task": "Dependency preflight passes", "description": "Manifest, lockfile, package-manager, and install-script changes are reviewed before install"}
- [ ] {"task": "Isolated build succeeds", "description": "The branch installs and checks from a disposable sandbox or container with lifecycle scripts disabled unless approved"}
- [ ] {"task": "Tests cover the changed behavior", "description": "Focused tests or a manual verification command are visible to reviewers"}
- [ ] {"task": "Security-sensitive paths are inspected", "description": "Auth, permissions, data movement, secrets, dependencies, and automation changes are reviewed first"}
- [ ] {"task": "Scanners are clean or triaged", "description": "Code scanning, secret scanning, and dependency alerts are resolved or explicitly accepted"}
- [ ] {"task": "Claims have evidence", "description": "Generated explanations are backed by code, tests, logs, docs, or maintainer reasoning"}
- [ ] {"task": "Rollback is understandable", "description": "The team knows how to revert or disable the change if production behavior regresses"}

## When to Block the Merge

Block or request changes when the PR:

- Adds a dependency without a reason, version pin, lockfile update, or license/security check.
- Changes authentication, authorization, payment, data deletion, or release automation without focused tests.
- Includes generated code that the author cannot explain.
- Relies on AI-written claims instead of reproducible verification.
- Moves secret handling, logging, or telemetry into a new path without privacy review.
- Makes sweeping style or architecture rewrites while claiming to fix a small bug.

## Troubleshooting

- **The PR is too large to review**: ask for a smaller PR that separates generated refactors from behavior changes.
- **CI is green but the risk still feels high**: add a targeted test around the risky path before merge.
- **The author says the AI tool verified it**: ask for the actual command output, test case, or source link a maintainer can inspect.
- **A scanner reports a warning the team accepts**: document the rationale in the PR so future reviewers can see the decision.

## Duplicate Check

This guide is intentionally about the maintainer workflow for reviewing
AI-generated pull requests before merge. Adjacent entries in the repository cover
code review tools, security scanners, and review-oriented agents, but they do not
provide a source-backed guide for the human merge decision on generated code.

## References

- Claude Code Docs: Best practices (adversarial review, verification, `/code-review`) - https://code.claude.com/docs/en/best-practices
- Claude Code Docs: Create custom subagents - https://code.claude.com/docs/en/sub-agents
- GitHub Docs: Reviewing proposed changes in a pull request - https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/reviewing-changes-in-pull-requests/reviewing-proposed-changes-in-a-pull-request
- GitHub Docs: Responsible use of GitHub Copilot code review - https://docs.github.com/en/copilot/responsible-use/code-review
- GitHub Docs: Review output from Copilot - https://docs.github.com/en/copilot/how-tos/copilot-on-github/use-copilot-agents/review-copilot-output
- GitHub Docs: About code scanning - https://docs.github.com/en/code-security/concepts/code-scanning/about-code-scanning
- GitHub Docs: About secret scanning - https://docs.github.com/en/code-security/concepts/secret-security/about-secret-scanning
- GitHub Docs: About dependency review - https://docs.github.com/en/code-security/concepts/supply-chain-security/about-dependency-review
- NIST SP 800-218: Secure Software Development Framework - https://csrc.nist.gov/pubs/sp/800/218/final

Trust & readiness

TrustReview first
Sourcesource-backed
Safety notesPresent
ReviewedYes

Community context

Related entries(4)
Related guides(3)
Community signals

Compare

Integrations & API

Contribute

Suggest a metadata change Claim this listing

Documentation Source repository Browse directory

Review first — review before installing

Open the source and read safety notes before installing.

Citation facts

Source-backed facts for citing this resource, derived directly from the registry — also available as plain text for AI assistants.

Canonical URL: https://heyclau.de/entry/guides/review-ai-generated-code-before-merge
Source URLs: https://code.claude.com/docs/en/best-practices, https://github.com/JSONbored/awesome-claude/blob/main/content/guides/review-ai-generated-code-before-merge.mdx
Safety notes: Treat AI-generated changes as untrusted code until a human reviewer verifies behavior, security impact, and rollback risk., Block merge when the PR changes authentication, authorization, data deletion, payment, networking, serialization, or release automation without focused tests., Do not accept generated explanations as proof; require CI output, reproducible commands, or links to authoritative project docs., Inspect package manifests, lockfiles, package-manager configuration, and dependency choices before installing from an untrusted branch., Run install, build, and test commands for untrusted PRs in a disposable sandbox or container, with package-manager lifecycle scripts disabled unless the changed scripts and packages have been reviewed and approved.
Privacy notes: Do not paste private code, secrets, customer data, logs, or incident details into external AI review tools unless your organization has approved that workflow., Keep review notes in the pull request or internal tracker so security decisions remain auditable.
Author: MkDev11
Submitted by: MkDev11
Claim status: unclaimed
Last verified: 2026-06-04

Decision playbook

Review trust signals before you adopt

Signals are present but mixed. Use the checklist below to confirm the source and operational safety for your environment.

Compare context

Selected

Current score

Baseline

—

Delta

No baseline selected

No major trust-signal divergence detected in the current selection.

Source and provenance checks

Complete

Confirm ownership and provenance before trusting install instructions.

Source link availableRequired
Open the canonical repository and verify ownership.
Done
Source provenance statusRequired
Marked as source-backed.
Done
Metadata reviewed
Registry metadata indicates a reviewed listing.
Done

Safety and privacy checks

Complete

Validate risk disclosures before installation or API wiring.

Safety notes presentRequired
Review the listed safety guidance before running commands.
Done
Privacy notes presentRequired
Review data handling notes before connecting accounts or secrets.
Done
Trust level risk gateRequired
Trust level does not block evaluation.
Done

Package and install checks

Needs review

Check package metadata and artifact integrity signals.

Install payload available
Install or copy payload is available for review.
Done
Package verification flag
No package verification flag provided.
Pending
Checksum metadata
No checksum provided for downloaded artifact.
Pending

Compare-driven decision checks

Needs review

Use compare context to validate trade-offs before adoption.

Compare tray has multiple entries
Add at least one more entry to compare trust differences.
Pending
Baseline comparison available
No baseline peer selected yet.
Pending
Diverging trust signals identified
No major trust-signal divergence found.
Pending

Setup at a glance

Copy & paste

Copy-ready — paste the snippet to get started.

Install command

Not provided

Config snippet

Not provided

Copy snippet

Provided

Prerequisites

5 to clear

Platforms

1 listed

Difficulty

58/100

Adoption plan

Balanced adoption plan

Current risk score 16/100. Use staged verification before broader rollout.

Risk 16

Pre-adoption checks

Validate source and review signals before any execution.

Confirm source provenanceRequired
Source URL/provenance metadata is present.
Done
Confirm metadata review state
Listing has review metadata.
Done
Verify install payload
Install/config payload exists and can be inspected.
Done

Security checks

Confirm safety, privacy, and package integrity signals.

Review safety notesRequired
Safety notes are present.
Done
Review privacy notesRequired
Privacy notes are present.
Done
Verify package integrity metadata
No package verification/checksum metadata.
Pending

Rollout

Adopt in controlled steps based on the selected plan.

Run in isolated sandbox firstRequired
Use a constrained sandbox and observe behavior across multiple tasks.
Pending
Roll out graduallyRequired
Roll out to a small cohort before wider usage.
Pending
Set monitoring and fallback
Define rollback path and monitor errors after adoption.
Pending

Evidence readiness

Evidence readiness matrix · balanced

Required evidence gates are covered (5/6 signals complete).

Risk 15

Source provenance

Present

Source repository/provenance is listed.

Required in this preset

Metadata review

Present

Review metadata is present.

Required in this preset

Safety notes

Present

Safety notes are present.

Required in this preset

Privacy notes

Present

Privacy notes are present.

Optional in this preset

Package integrity

Missing

Package integrity metadata is missing.

Optional in this preset

Install payload

Present

Install payload is available.

Required in this preset

Required evidence gates are covered for this preset.

Decision timeline

Decision timeline · balanced

5/6 steps complete with no blocking gaps for this preset.

Risk 14

triage

Confirm source provenanceRequired

Source/provenance metadata is available.

Done

triage

Check metadata review statusRequired

Review metadata is available.

Done

verify

Review safety notesRequired

Safety notes are available.

Done

verify

Review privacy notes

Privacy notes are available.

Done

verify

Validate package integrity metadata

Package integrity metadata is missing.

Pending

rollout

Verify install payload and commandsRequired

Install payload is available.

Done

No required blockers for this timeline preset.

Prerequisite readiness

5 prerequisites to line up before setup.

0/5 ready

Install & runtime1Permissions & scopes1General3

Safety & privacy surface

5 safety and 2 privacy notes across 6 risk areas. Review closely: credentials & tokens, permissions & scopes, network access.

6 areas

SafetyGeneralTreat AI-generated changes as untrusted code until a human reviewer verifies behavior, security impact, and rollback risk.
SafetyPermissions & scopesBlock merge when the PR changes authentication, authorization, data deletion, payment, networking, serialization, or release automation without focused tests.
SafetyExecution & processesDo not accept generated explanations as proof; require CI output, reproducible commands, or links to authoritative project docs.
SafetyLocal filesInspect package manifests, lockfiles, package-manager configuration, and dependency choices before installing from an untrusted branch.
SafetyExecution & processesRun install, build, and test commands for untrusted PRs in a disposable sandbox or container, with package-manager lifecycle scripts disabled unless the changed scripts and packages have been reviewed and approved.
PrivacyCredentials & tokensDo not paste private code, secrets, customer data, logs, or incident details into external AI review tools unless your organization has approved that workflow.
PrivacyNetwork accessKeep review notes in the pull request or internal tracker so security decisions remain auditable.

Safety notes

Treat AI-generated changes as untrusted code until a human reviewer verifies behavior, security impact, and rollback risk.
Block merge when the PR changes authentication, authorization, data deletion, payment, networking, serialization, or release automation without focused tests.
Do not accept generated explanations as proof; require CI output, reproducible commands, or links to authoritative project docs.
Inspect package manifests, lockfiles, package-manager configuration, and dependency choices before installing from an untrusted branch.
Run install, build, and test commands for untrusted PRs in a disposable sandbox or container, with package-manager lifecycle scripts disabled unless the changed scripts and packages have been reviewed and approved.

Privacy notes

Do not paste private code, secrets, customer data, logs, or incident details into external AI review tools unless your organization has approved that workflow.
Keep review notes in the pull request or internal tracker so security decisions remain auditable.

Prerequisites

Access to the pull request diff and the branch's CI results.
Permission to request changes or block merge when evidence is missing.
Project-specific test commands for the touched package or service.
Secret scanning, code scanning, or equivalent local checks for risky repositories.
A disposable sandbox, container, or isolated development environment for running untrusted PR commands.

Schema details

Install type: copy
Reading time: 8 min
Difficulty score: 58
Troubleshooting: Yes
Breaking changes: No

Skill and platform metadata

Retrieval sources

https://code.claude.com/docs/en/best-practiceshttps://code.claude.com/docs/en/sub-agents

Full copyable content

## TL;DR

Review AI-generated code like any other untrusted implementation: inspect the
diff, rebuild it, test the changed behavior, check security-sensitive paths, and
only merge after a reviewer can explain why the code is correct.

The useful mental model is simple: **AI can propose code, but the reviewer owns
the merge decision.** The pull request should contain enough evidence for a
human maintainer to verify the change without relying on generated confidence.
Even when an AI reviewer provides comments or suggested changes, validate that
feedback carefully and supplement it with human review before merging.

## Prerequisites & Requirements

- [ ] {"task": "Isolated checkout", "description": "You can rebuild the branch in a disposable sandbox or container and reproduce the test results"}
- [ ] {"task": "Project test commands", "description": "You know the focused tests, linters, and type checks for the touched code"}
- [ ] {"task": "Security scanning", "description": "Code scanning, secret scanning, dependency review, or local equivalents are available for risky diffs"}
- [ ] {"task": "Reviewer authority", "description": "You can request changes when the PR lacks tests, source links, or a clear rollback path"}

## Core Concepts Explained

### AI output is an implementation, not evidence

Generated code may be useful, but the explanation around it is not proof that
the change is correct. Treat claims such as "all edge cases are handled" or "no
security impact" as review prompts. Ask for the command, test, trace, design
reference, or code path that proves the claim.

### Review behavior before style

Generated diffs often look polished. Start with behavior: what input changed,
what output changed, which permissions changed, what data moves, and what can
fail. Style cleanups can wait until the reviewer understands the actual runtime
effect.

### Separate generated-size risk from feature risk

A small generated diff can change authorization logic. A large generated diff
can be mostly generated fixtures. Review risk by the code path and blast radius,
not by whether the author used an AI tool.

## Step-by-Step Implementation Guide

1. **Freeze the PR scope.** Ask the author to state which files were generated,
   which were manually edited, and what user-visible behavior should change. If
   the PR mixes unrelated refactors, generated rewrites, and feature work,
   request a smaller branch before reviewing.

2. **Preflight dependency and tooling changes.** Before running install, build,
   or test commands from the branch, inspect changes to package manifests,
   lockfiles, package-manager configuration, install scripts, container images,
   GitHub Actions, and third-party SDKs. Confirm why each new dependency or
   lifecycle script is required.

3. **Rebuild in an isolated environment.** Pull the branch into a disposable
   sandbox, container, or isolated development environment. Install
   dependencies using the repository's documented workflow with package-manager
   lifecycle scripts disabled unless the changed scripts and packages have been
   reviewed and approved, then run the focused checks for the touched package.
   Do not rely only on screenshots, generated summaries, or copied terminal
   output in the PR body.

4. **Review security-sensitive paths first.** Start with authentication,
   authorization, secrets, payments, migrations, data deletion, networking,
   deserialization, sandbox escapes, release automation, and permission changes.
   These paths get review priority because a plausible-looking generated patch
   can still change a trust boundary.

5. **Check secrets and dependency changes.** Run secret scanning or a local
   equivalent before merge. Re-check dependency diffs after installing so
   generated or refreshed lockfiles still match the reviewed dependency set.

6. **Read the diff in small slices.** Review one behavior path at a time. For
   each slice, ask: what invariant should stay true, what test proves it, what
   user data is touched, and what happens when the new code fails?

7. **Require focused tests for changed behavior.** Unit tests are usually enough
   for pure functions. Integration tests are better for permission checks,
   persistence, API clients, migrations, and concurrency. If a test would be too
   expensive, ask for a documented manual verification command.

8. **Verify AI-written comments and docs.** Generated comments can drift from the
   code they describe. Check that public docs, migration notes, and inline
   comments match the implementation and do not promise unsupported behavior.

9. **Review agent-created PRs as production code.** If an AI coding agent opened
   the pull request, inspect the final diff thoroughly before merging. Confirm
   that the agent's summary matches the files, that requested review comments
   were actually addressed, and that no unrelated generated changes slipped in.

10. **Merge only after the reviewer-owned checklist passes.** The reviewer should
   be able to summarize the change, name the highest-risk file, point to the
   verification evidence, and explain the rollback plan.

## Reviewing With Claude Code

When the diff was produced by Claude Code, you can use the same tool to add an
independent review step. The key idea from the Claude Code best-practices guide
is that a fresh context reviews better than the one that wrote the change: a
reviewer running in a separate [subagent](https://code.claude.com/docs/en/sub-agents)
context "sees only the diff and the criteria you give it, not the reasoning that
produced the change, so it evaluates the result on its own terms." The session
that implemented the work receives the gaps directly and can fix and re-review
without you copying findings between windows.

Claude Code ships a bundled `/code-review` skill that "reviews the current diff
for bugs in a fresh subagent and returns findings to the session." Run it before
treating a change as done:

```text
/code-review
```

To check a diff against your own plan or requirements instead of generic bug
hunting, write the review prompt yourself and tell Claude to delegate it. Name
the work to check, what to check it against, and what counts as a finding:

```text
Use a subagent to review the rate limiter diff against PLAN.md. Check that
every requirement is implemented, the listed edge cases have tests, and
nothing outside the task's scope changed. Report gaps, not style preferences.
```

For a reusable reviewer, define a subagent in `.claude/agents/`. Subagents are
Markdown files with YAML frontmatter where only `name` and `description` are
required; the body becomes the subagent's system prompt, and `tools` restricts
what it can do (omit it to inherit all tools). The following restricts the
reviewer to read-only tools so it cannot edit files while reviewing:

```markdown
---
name: code-reviewer
description: Reviews code for quality and best practices
tools: Read, Glob, Grep
model: sonnet
---

You are a senior reviewer. Review the current diff for:
- Correctness against the stated requirements
- Security-sensitive paths: auth, secrets, data deletion, networking
- Missing tests for changed behavior

Report only gaps that affect correctness or the stated requirements.
Provide specific line references. Do not edit files.
```

Then invoke it explicitly, for example: `Use a subagent to review this code for
security issues.` Because subagents run in their own context window with their
own allowed tools, the review does not clutter the implementing conversation.

You can also run review and verification non-interactively, which is how Claude
Code integrates into CI pipelines and pre-commit hooks. The `claude -p` flag
runs a single prompt without a session, and `--output-format json` returns
structured output a script can parse:

```bash
# Run a focused review on the current diff in CI
claude -p "Review the staged diff for security and missing tests. Report gaps only." \
  --output-format json
```

The best-practices guide is explicit that a reviewer asked to find gaps will
usually report some even when the work is sound. Tell the reviewer to flag only
gaps that affect correctness or the stated requirements, and treat the rest as
optional, to avoid over-engineering, defensive code, and tests for cases that
cannot happen.

### Review Dimensions Reference

Use these dimensions when prompting a Claude Code review subagent or running
`/code-review`. They map the source guidance onto a concrete checklist.

| Dimension | What to check | How Claude Code helps |
| --- | --- | --- |
| Independent context | Reviewer did not write the change | Run the review in a fresh subagent or a separate session so the model is not biased toward code it just wrote |
| Correctness vs. plan | Every requirement implemented; nothing out of scope changed | Prompt a subagent to review the diff against `PLAN.md` and report gaps |
| Security-sensitive paths | Auth, secrets, data deletion, networking, deserialization | Use a read-only `code-reviewer` subagent (`tools: Read, Glob, Grep`) focused on these paths |
| Verification evidence | Tests, build exit code, or a script that produces a pass/fail | "Have Claude show evidence rather than asserting success": run the check and read the result |
| Test coverage for changes | Focused tests for the changed behavior exist | Ask the reviewer to flag changed behavior that lacks a test |
| Finding discipline | Findings are real gaps, not style nits | Tell the reviewer to report only gaps affecting correctness or requirements |
| Automated gate | Review runs without a human present | `claude -p "..." --output-format json` in CI or a pre-commit hook |

## Reviewer Checklist

- [ ] {"task": "Scope is narrow", "description": "The PR changes one behavior or one coherent workflow"}
- [ ] {"task": "Generated files are identified", "description": "The author says which parts came from an AI coding tool"}
- [ ] {"task": "Dependency preflight passes", "description": "Manifest, lockfile, package-manager, and install-script changes are reviewed before install"}
- [ ] {"task": "Isolated build succeeds", "description": "The branch installs and checks from a disposable sandbox or container with lifecycle scripts disabled unless approved"}
- [ ] {"task": "Tests cover the changed behavior", "description": "Focused tests or a manual verification command are visible to reviewers"}
- [ ] {"task": "Security-sensitive paths are inspected", "description": "Auth, permissions, data movement, secrets, dependencies, and automation changes are reviewed first"}
- [ ] {"task": "Scanners are clean or triaged", "description": "Code scanning, secret scanning, and dependency alerts are resolved or explicitly accepted"}
- [ ] {"task": "Claims have evidence", "description": "Generated explanations are backed by code, tests, logs, docs, or maintainer reasoning"}
- [ ] {"task": "Rollback is understandable", "description": "The team knows how to revert or disable the change if production behavior regresses"}

## When to Block the Merge

Block or request changes when the PR:

- Adds a dependency without a reason, version pin, lockfile update, or license/security check.
- Changes authentication, authorization, payment, data deletion, or release automation without focused tests.
- Includes generated code that the author cannot explain.
- Relies on AI-written claims instead of reproducible verification.
- Moves secret handling, logging, or telemetry into a new path without privacy review.
- Makes sweeping style or architecture rewrites while claiming to fix a small bug.

## Troubleshooting

- **The PR is too large to review**: ask for a smaller PR that separates generated refactors from behavior changes.
- **CI is green but the risk still feels high**: add a targeted test around the risky path before merge.
- **The author says the AI tool verified it**: ask for the actual command output, test case, or source link a maintainer can inspect.
- **A scanner reports a warning the team accepts**: document the rationale in the PR so future reviewers can see the decision.

## Duplicate Check

This guide is intentionally about the maintainer workflow for reviewing
AI-generated pull requests before merge. Adjacent entries in the repository cover
code review tools, security scanners, and review-oriented agents, but they do not
provide a source-backed guide for the human merge decision on generated code.

## References

- Claude Code Docs: Best practices (adversarial review, verification, `/code-review`) - https://code.claude.com/docs/en/best-practices
- Claude Code Docs: Create custom subagents - https://code.claude.com/docs/en/sub-agents
- GitHub Docs: Reviewing proposed changes in a pull request - https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/reviewing-changes-in-pull-requests/reviewing-proposed-changes-in-a-pull-request
- GitHub Docs: Responsible use of GitHub Copilot code review - https://docs.github.com/en/copilot/responsible-use/code-review
- GitHub Docs: Review output from Copilot - https://docs.github.com/en/copilot/how-tos/copilot-on-github/use-copilot-agents/review-copilot-output
- GitHub Docs: About code scanning - https://docs.github.com/en/code-security/concepts/code-scanning/about-code-scanning
- GitHub Docs: About secret scanning - https://docs.github.com/en/code-security/concepts/secret-security/about-secret-scanning
- GitHub Docs: About dependency review - https://docs.github.com/en/code-security/concepts/supply-chain-security/about-dependency-review
- NIST SP 800-218: Secure Software Development Framework - https://csrc.nist.gov/pubs/sp/800/218/final

About this resource

TL;DR

Review AI-generated code like any other untrusted implementation: inspect the diff, rebuild it, test the changed behavior, check security-sensitive paths, and only merge after a reviewer can explain why the code is correct.

The useful mental model is simple: AI can propose code, but the reviewer owns the merge decision. The pull request should contain enough evidence for a human maintainer to verify the change without relying on generated confidence. Even when an AI reviewer provides comments or suggested changes, validate that feedback carefully and supplement it with human review before merging.

Prerequisites & Requirements

{"task": "Isolated checkout", "description": "You can rebuild the branch in a disposable sandbox or container and reproduce the test results"}
{"task": "Project test commands", "description": "You know the focused tests, linters, and type checks for the touched code"}
{"task": "Security scanning", "description": "Code scanning, secret scanning, dependency review, or local equivalents are available for risky diffs"}
{"task": "Reviewer authority", "description": "You can request changes when the PR lacks tests, source links, or a clear rollback path"}

Core Concepts Explained

AI output is an implementation, not evidence

Generated code may be useful, but the explanation around it is not proof that the change is correct. Treat claims such as "all edge cases are handled" or "no security impact" as review prompts. Ask for the command, test, trace, design reference, or code path that proves the claim.

Review behavior before style

Generated diffs often look polished. Start with behavior: what input changed, what output changed, which permissions changed, what data moves, and what can fail. Style cleanups can wait until the reviewer understands the actual runtime effect.

Separate generated-size risk from feature risk

A small generated diff can change authorization logic. A large generated diff can be mostly generated fixtures. Review risk by the code path and blast radius, not by whether the author used an AI tool.

Step-by-Step Implementation Guide

Freeze the PR scope. Ask the author to state which files were generated, which were manually edited, and what user-visible behavior should change. If the PR mixes unrelated refactors, generated rewrites, and feature work, request a smaller branch before reviewing.
Preflight dependency and tooling changes. Before running install, build, or test commands from the branch, inspect changes to package manifests, lockfiles, package-manager configuration, install scripts, container images, GitHub Actions, and third-party SDKs. Confirm why each new dependency or lifecycle script is required.
Rebuild in an isolated environment. Pull the branch into a disposable sandbox, container, or isolated development environment. Install dependencies using the repository's documented workflow with package-manager lifecycle scripts disabled unless the changed scripts and packages have been reviewed and approved, then run the focused checks for the touched package. Do not rely only on screenshots, generated summaries, or copied terminal output in the PR body.
Review security-sensitive paths first. Start with authentication, authorization, secrets, payments, migrations, data deletion, networking, deserialization, sandbox escapes, release automation, and permission changes. These paths get review priority because a plausible-looking generated patch can still change a trust boundary.
Check secrets and dependency changes. Run secret scanning or a local equivalent before merge. Re-check dependency diffs after installing so generated or refreshed lockfiles still match the reviewed dependency set.
Read the diff in small slices. Review one behavior path at a time. For each slice, ask: what invariant should stay true, what test proves it, what user data is touched, and what happens when the new code fails?
Require focused tests for changed behavior. Unit tests are usually enough for pure functions. Integration tests are better for permission checks, persistence, API clients, migrations, and concurrency. If a test would be too expensive, ask for a documented manual verification command.
Verify AI-written comments and docs. Generated comments can drift from the code they describe. Check that public docs, migration notes, and inline comments match the implementation and do not promise unsupported behavior.
Review agent-created PRs as production code. If an AI coding agent opened the pull request, inspect the final diff thoroughly before merging. Confirm that the agent's summary matches the files, that requested review comments were actually addressed, and that no unrelated generated changes slipped in.
Merge only after the reviewer-owned checklist passes. The reviewer should be able to summarize the change, name the highest-risk file, point to the verification evidence, and explain the rollback plan.

Reviewing With Claude Code

When the diff was produced by Claude Code, you can use the same tool to add an independent review step. The key idea from the Claude Code best-practices guide is that a fresh context reviews better than the one that wrote the change: a reviewer running in a separate subagent context "sees only the diff and the criteria you give it, not the reasoning that produced the change, so it evaluates the result on its own terms." The session that implemented the work receives the gaps directly and can fix and re-review without you copying findings between windows.

Claude Code ships a bundled /code-review skill that "reviews the current diff for bugs in a fresh subagent and returns findings to the session." Run it before treating a change as done:

/code-review

To check a diff against your own plan or requirements instead of generic bug hunting, write the review prompt yourself and tell Claude to delegate it. Name the work to check, what to check it against, and what counts as a finding:

Use a subagent to review the rate limiter diff against PLAN.md. Check that
every requirement is implemented, the listed edge cases have tests, and
nothing outside the task's scope changed. Report gaps, not style preferences.

For a reusable reviewer, define a subagent in .claude/agents/. Subagents are Markdown files with YAML frontmatter where only name and description are required; the body becomes the subagent's system prompt, and tools restricts what it can do (omit it to inherit all tools). The following restricts the reviewer to read-only tools so it cannot edit files while reviewing:

---
name: code-reviewer
description: Reviews code for quality and best practices
tools: Read, Glob, Grep
model: sonnet
---

You are a senior reviewer. Review the current diff for:
- Correctness against the stated requirements
- Security-sensitive paths: auth, secrets, data deletion, networking
- Missing tests for changed behavior

Report only gaps that affect correctness or the stated requirements.
Provide specific line references. Do not edit files.

Then invoke it explicitly, for example: Use a subagent to review this code for security issues. Because subagents run in their own context window with their own allowed tools, the review does not clutter the implementing conversation.

You can also run review and verification non-interactively, which is how Claude Code integrates into CI pipelines and pre-commit hooks. The claude -p flag runs a single prompt without a session, and --output-format json returns structured output a script can parse:

# Run a focused review on the current diff in CI
claude -p "Review the staged diff for security and missing tests. Report gaps only." \
  --output-format json

The best-practices guide is explicit that a reviewer asked to find gaps will usually report some even when the work is sound. Tell the reviewer to flag only gaps that affect correctness or the stated requirements, and treat the rest as optional, to avoid over-engineering, defensive code, and tests for cases that cannot happen.

Review Dimensions Reference

Use these dimensions when prompting a Claude Code review subagent or running /code-review. They map the source guidance onto a concrete checklist.

Dimension	What to check	How Claude Code helps
Independent context	Reviewer did not write the change	Run the review in a fresh subagent or a separate session so the model is not biased toward code it just wrote
Correctness vs. plan	Every requirement implemented; nothing out of scope changed	Prompt a subagent to review the diff against `PLAN.md` and report gaps
Security-sensitive paths	Auth, secrets, data deletion, networking, deserialization	Use a read-only `code-reviewer` subagent (`tools: Read, Glob, Grep`) focused on these paths
Verification evidence	Tests, build exit code, or a script that produces a pass/fail	"Have Claude show evidence rather than asserting success": run the check and read the result
Test coverage for changes	Focused tests for the changed behavior exist	Ask the reviewer to flag changed behavior that lacks a test
Finding discipline	Findings are real gaps, not style nits	Tell the reviewer to report only gaps affecting correctness or requirements
Automated gate	Review runs without a human present	`claude -p "..." --output-format json` in CI or a pre-commit hook

Reviewer Checklist

{"task": "Scope is narrow", "description": "The PR changes one behavior or one coherent workflow"}
{"task": "Generated files are identified", "description": "The author says which parts came from an AI coding tool"}
{"task": "Dependency preflight passes", "description": "Manifest, lockfile, package-manager, and install-script changes are reviewed before install"}
{"task": "Isolated build succeeds", "description": "The branch installs and checks from a disposable sandbox or container with lifecycle scripts disabled unless approved"}
{"task": "Tests cover the changed behavior", "description": "Focused tests or a manual verification command are visible to reviewers"}
{"task": "Security-sensitive paths are inspected", "description": "Auth, permissions, data movement, secrets, dependencies, and automation changes are reviewed first"}
{"task": "Scanners are clean or triaged", "description": "Code scanning, secret scanning, and dependency alerts are resolved or explicitly accepted"}
{"task": "Claims have evidence", "description": "Generated explanations are backed by code, tests, logs, docs, or maintainer reasoning"}
{"task": "Rollback is understandable", "description": "The team knows how to revert or disable the change if production behavior regresses"}

When to Block the Merge

Block or request changes when the PR:

Adds a dependency without a reason, version pin, lockfile update, or license/security check.
Changes authentication, authorization, payment, data deletion, or release automation without focused tests.
Includes generated code that the author cannot explain.
Relies on AI-written claims instead of reproducible verification.
Moves secret handling, logging, or telemetry into a new path without privacy review.
Makes sweeping style or architecture rewrites while claiming to fix a small bug.

Troubleshooting

The PR is too large to review: ask for a smaller PR that separates generated refactors from behavior changes.
CI is green but the risk still feels high: add a targeted test around the risky path before merge.
The author says the AI tool verified it: ask for the actual command output, test case, or source link a maintainer can inspect.
A scanner reports a warning the team accepts: document the rationale in the PR so future reviewers can see the decision.

Duplicate Check

This guide is intentionally about the maintainer workflow for reviewing AI-generated pull requests before merge. Adjacent entries in the repository cover code review tools, security scanners, and review-oriented agents, but they do not provide a source-backed guide for the human merge decision on generated code.

References

Claude Code Docs: Best practices (adversarial review, verification, /code-review) - https://code.claude.com/docs/en/best-practices
Claude Code Docs: Create custom subagents - https://code.claude.com/docs/en/sub-agents
GitHub Docs: Reviewing proposed changes in a pull request - https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/reviewing-changes-in-pull-requests/reviewing-proposed-changes-in-a-pull-request
GitHub Docs: Responsible use of GitHub Copilot code review - https://docs.github.com/en/copilot/responsible-use/code-review
GitHub Docs: Review output from Copilot - https://docs.github.com/en/copilot/how-tos/copilot-on-github/use-copilot-agents/review-copilot-output
GitHub Docs: About code scanning - https://docs.github.com/en/code-security/concepts/code-scanning/about-code-scanning
GitHub Docs: About secret scanning - https://docs.github.com/en/code-security/concepts/secret-security/about-secret-scanning
GitHub Docs: About dependency review - https://docs.github.com/en/code-security/concepts/supply-chain-security/about-dependency-review
NIST SP 800-218: Secure Software Development Framework - https://csrc.nist.gov/pubs/sp/800/218/final

#code-review #ai-generated-code #security #pull-requests #ci #dependency-review #secret-scanning

Source citations

Source methodology →

Add this badge to your README

Show that Review AI-Generated Code Before Merge is listed on HeyClaude. Paste this Markdown into your README — it renders the badge and links back to this page.

[![Listed on HeyClaude](https://heyclau.de/badge/guides/review-ai-generated-code-before-merge.svg)](https://heyclau.de/entry/guides/review-ai-generated-code-before-merge)

How it compares

Review AI-Generated Code Before Merge side by side with 3 alternatives on trust, install, platform support, and disclosed safety notes — all from reviewed registry metadata.

2 trust signals differ across this comparison (Source provenance, Submitter).

Field	Review AI-Generated Code Before Merge A source-backed review workflow for pull requests that include AI-generated code. Treat generated diffs as untrusted implementation work, verify behavior in CI, inspect security-sensitive paths first, and merge only after a reviewer-owned checklist passes. Open dossier	Claude Agent Development Build autonomous agents with the Claude Agent SDK and Claude Code subagents: the query loop, built-in tools, subagent delegation, and permission controls. Open dossier	Claude Code Subagents For Repository Maintenance Delegate repository maintenance to Claude Code subagents: docs drift scans, dependency report triage, README sync checks, and stale issue grooming with scoped tools, read-first policies, and human merge gates. Open dossier	Large Code Migration Workflow with Claude Code A repeatable explore-plan-implement-verify workflow for large code migrations and refactors with Claude Code, using plan mode, /rewind checkpoints, subagents, and claude -p fan-out for batch file changes. Open dossier
Next steps	Open dossier API JSON Open LLM Open source Newsletter Claim listing	Open dossier API JSON Open LLM Open source Newsletter Claim listing	Open dossier API JSON Open LLM Open source Newsletter Claim listing	Open dossier API JSON Open LLM Open source Newsletter Claim listing
Trust
Review status	ReviewedMaintainer reviewed	ReviewedMaintainer reviewed	ReviewedMaintainer reviewed	ReviewedMaintainer reviewed
Package trust	Package not verified	Package not verified	Package not verified	Package not verified
Source provenanceDiffers	Source-backed	Source-backed	Submission linkedSource submission	Source-backed
SubmitterDiffers	MkDev11	—	kiannidev	—
Install risk	Review first	Review first	Review first	Review first
Notes	Safety ✓ Privacy ✓	Safety ✓ Privacy ✓	Safety ✓ Privacy ✓	Safety ✓ Privacy ✓
Brand	—	—	—	—
Category	guides	guides	guides	guides
Source	Source-backed	Source-backed	Source-backed	Source-backed
Author	MkDev11	JSONbored	kiannidev	JSONbored
Added	2026-06-04	2025-10-27	2026-06-16	2025-10-27
Platforms	Claude Code	Claude Code	Claude Code	Claude Code
Harness	Claude Code	Claude Code	Claude Code	Claude Code
Source repo	—	—	—	—
Safety notes	✓Treat AI-generated changes as untrusted code until a human reviewer verifies behavior, security impact, and rollback risk. Block merge when the PR changes authentication, authorization, data deletion, payment, networking, serialization, or release automation without focused tests. Do not accept generated explanations as proof; require CI output, reproducible commands, or links to authoritative project docs. Inspect package manifests, lockfiles, package-manager configuration, and dependency choices before installing from an untrusted branch. Run install, build, and test commands for untrusted PRs in a disposable sandbox or container, with package-manager lifecycle scripts disabled unless the changed scripts and packages have been reviewed and approved.	✓Agents built with the Agent SDK run real tools (Bash commands, file edits, web fetches) autonomously in your process and on your filesystem; scope capability with allowed_tools/permission_mode and review what each tool and connected MCP server can do before granting it.	✓Maintenance subagents can propose file edits and shell commands—start read-only and add write tools only after review policy exists. Parallel subagents multiply tool calls; cap concurrent maintenance runs on large monorepos to control cost and noise. Dependency upgrade suggestions require human verification against semver, license, and security advisories before merge.	✓The claude -p fan-out loop runs Claude non-interactively across many files. Scope it with --allowedTools (for example "Edit,Bash(git commit *)") so unattended runs cannot perform actions you did not intend, and test on 2-3 files before running at scale. Checkpoints only track Claude's direct file edits, not changes made by bash commands (rm, mv, cp) or other processes, so commit to git before a large migration.
Privacy notes	✓Do not paste private code, secrets, customer data, logs, or incident details into external AI review tools unless your organization has approved that workflow. Keep review notes in the pull request or internal tracker so security decisions remain auditable.	✓The SDK authenticates with ANTHROPIC_API_KEY (or a third-party provider's credentials); keep the key in an environment variable, never in agent prompts or committed code. Connected MCP servers and the WebFetch/WebSearch tools can send project data to external systems.	✓Maintenance scans read internal docs, issue titles, dependency manifests, and CI configuration that may describe unreleased features. Subagent transcripts may retain file paths and package names from private forks; avoid pasting customer data into maintenance prompts. External MCP connectors can expose additional metadata—document what each maintenance subagent may read.	✓Claude Code sends the files it reads and command output to the model as context. Avoid piping secrets or credentials into prompts, and exclude sensitive paths from migration runs.
Prerequisites	Access to the pull request diff and the branch's CI results. Permission to request changes or block merge when evidence is missing. Project-specific test commands for the touched package or service. Secret scanning, code scanning, or equivalent local checks for risky repositories.	— none listed	Claude Code with subagents available for your account and project. Recurring maintenance work that benefits from separate specialist context. Documented human owners for merges, label changes, and dependency upgrades. Optional MCP or GitHub integrations scoped to maintenance repositories only.	— none listed
Install	—	—	—	—
Config	—	—	—	—
Citations	Source repositorygithub.com 2026-07-19T13:50:19+00:00 Documentationcode.claude.com Submitted by MkDev112026-06-04 Source methodology →	Source repositorygithub.com 2026-07-19T13:50:19+00:00 Documentationcode.claude.com Source methodology →	Source repositorygithub.com 2026-07-19T13:50:19+00:00 Documentationcode.claude.com Submitted by kiannidev2026-06-16 Source methodology →	Source repositorygithub.com 2026-07-19T13:50:19+00:00 Documentationcode.claude.com Source methodology →
Claim	Unclaimed	Unclaimed	Unclaimed	Unclaimed

Open 4 picks in the interactive comparison tool

Signals

Loading live community signals…

Citation facts

Review trust signals before you adopt

Source and provenance checks

Safety and privacy checks

Package and install checks

Compare-driven decision checks

Copy & paste

Balanced adoption plan

Pre-adoption checks

Security checks

Rollout

Evidence readiness matrix · balanced

Source provenance

Metadata review

Safety notes

Privacy notes

Package integrity

Install payload

Decision timeline · balanced

Confirm source provenanceRequired

Check metadata review statusRequired

Review safety notesRequired

Review privacy notes

Validate package integrity metadata

Verify install payload and commandsRequired

Prerequisite readiness

Safety & privacy surface

Safety notes

Privacy notes

Prerequisites

Schema details

About this resource

TL;DR

Prerequisites & Requirements

Core Concepts Explained

AI output is an implementation, not evidence

Review behavior before style

Separate generated-size risk from feature risk

Step-by-Step Implementation Guide

Reviewing With Claude Code

Review Dimensions Reference

Reviewer Checklist

When to Block the Merge

Troubleshooting

Duplicate Check

References

Source citations

Add this badge to your README

How it compares

Related resources

Claude Agent Development

Claude Code Subagents For Repository Maintenance

Large Code Migration Workflow with Claude Code

Pull Request Triage Capability Pack Skill

Related guides

Claude Code GitHub Actions Review Workflow

Use Subagents for Code Review and Triage

Auditing MCP Client Configuration Before Team Rollout

Signals