Coding Agent Test-Before-Merge Rules

Source-backed rules for requiring coding agents to provide fresh, scoped, and reviewer-visible test evidence before a pull request can be approved or merged.

by MkDev11·added 2026-06-04·

Claude Code

HarnessClaude Code

Command center

Source

Review first

Review safety and privacy notes before installing or copying commands.

Safety notes Privacy notes

Install & copy

You are deciding whether a coding-agent pull request has enough test evidence
to merge.

Rules:
1. Map every changed behavior to a focused test, CI check, or documented
   manual verification step.
2. Require exact commands, environments, commit SHAs, and pass/fail results;
   do not accept agent summaries as proof.
3. Re-run or refresh checks after rebases, force pushes, dependency changes,
   generated artifacts, or test-fix commits.
4. Block merge when tests are skipped, stale, unrelated, flaky without
   triage, or run against unsafe production-like targets.
5. Keep logs and fixtures privacy-safe before pasting evidence into public
   pull request comments.

Trust & readiness

TrustReview first
Sourcesource-backed
Safety notesPresent
ReviewedYes

Community context

Related entries(4)
Related guides(3)
Community signals

Compare

Integrations & API

Contribute

Suggest a metadata change Claim this listing

Documentation Source repository Browse directory

Review first — review before installing

Open the source and read safety notes before installing.

Citation facts

Source-backed facts for citing this resource, derived directly from the registry — also available as plain text for AI assistants.

Canonical URL: https://heyclau.de/entry/rules/coding-agent-test-before-merge-rules
Source URLs: https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/collaborating-on-repositories-with-code-quality-features/about-status-checks, https://github.com/JSONbored/awesome-claude/blob/main/content/rules/coding-agent-test-before-merge-rules.mdx
Safety notes: Tests can mutate databases, send network requests, write files, consume cloud quota, or contact sandboxed services; coding agents must use isolated test environments and documented safe commands., Passing one focused test does not prove unrelated risky paths are safe; require broader checks for auth, data deletion, payments, migrations, release automation, and generated artifacts., Treat agent-generated test summaries as untrusted until the reviewer can inspect the exact command, status check, log, or reproducible manual verification evidence.
Privacy notes: Test logs, snapshots, fixtures, screenshots, coverage reports, and CI artifacts can expose secrets, customer data, internal hostnames, private package names, prompts, or file paths., Do not paste raw private CI logs, database rows, browser traces, screenshots, or failing payloads into public PR comments without redaction., When a test needs private credentials or data, record the approved internal verification channel rather than copying sensitive details into the public PR.
Author: MkDev11
Submitted by: MkDev11
Claim status: unclaimed
Last verified: 2026-06-04

Decision playbook

Review trust signals before you adopt

Signals are present but mixed. Use the checklist below to confirm the source and operational safety for your environment.

Compare context

Selected

Current score

Baseline

—

Delta

No baseline selected

No major trust-signal divergence detected in the current selection.

Source and provenance checks

Complete

Confirm ownership and provenance before trusting install instructions.

Source link availableRequired
Open the canonical repository and verify ownership.
Done
Source provenance statusRequired
Marked as source-backed.
Done
Metadata reviewed
Registry metadata indicates a reviewed listing.
Done

Safety and privacy checks

Complete

Validate risk disclosures before installation or API wiring.

Safety notes presentRequired
Review the listed safety guidance before running commands.
Done
Privacy notes presentRequired
Review data handling notes before connecting accounts or secrets.
Done
Trust level risk gateRequired
Trust level does not block evaluation.
Done

Package and install checks

Needs review

Check package metadata and artifact integrity signals.

Install payload available
Install or copy payload is available for review.
Done
Package verification flag
No package verification flag provided.
Pending
Checksum metadata
No checksum provided for downloaded artifact.
Pending

Compare-driven decision checks

Needs review

Use compare context to validate trade-offs before adoption.

Compare tray has multiple entries
Add at least one more entry to compare trust differences.
Pending
Baseline comparison available
No baseline peer selected yet.
Pending
Diverging trust signals identified
No major trust-signal divergence found.
Pending

Setup at a glance

Copy & paste

Copy-ready — paste the snippet to get started.

15 minutes

Install command

Not provided

Config snippet

Not provided

Copy snippet

Provided

Prerequisites

4 to clear

Platforms

1 listed

Difficulty

42/100

Adoption plan

Balanced adoption plan

Current risk score 16/100. Use staged verification before broader rollout.

Risk 16

Pre-adoption checks

Validate source and review signals before any execution.

Confirm source provenanceRequired
Source URL/provenance metadata is present.
Done
Confirm metadata review state
Listing has review metadata.
Done
Verify install payload
Install/config payload exists and can be inspected.
Done

Security checks

Confirm safety, privacy, and package integrity signals.

Review safety notesRequired
Safety notes are present.
Done
Review privacy notesRequired
Privacy notes are present.
Done
Verify package integrity metadata
No package verification/checksum metadata.
Pending

Rollout

Adopt in controlled steps based on the selected plan.

Run in isolated sandbox firstRequired
Use a constrained sandbox and observe behavior across multiple tasks.
Pending
Roll out graduallyRequired
Roll out to a small cohort before wider usage.
Pending
Set monitoring and fallback
Define rollback path and monitor errors after adoption.
Pending

Evidence readiness

Evidence readiness matrix · balanced

Required evidence gates are covered (5/6 signals complete).

Risk 15

Source provenance

Present

Source repository/provenance is listed.

Required in this preset

Metadata review

Present

Review metadata is present.

Required in this preset

Safety notes

Present

Safety notes are present.

Required in this preset

Privacy notes

Present

Privacy notes are present.

Optional in this preset

Package integrity

Missing

Package integrity metadata is missing.

Optional in this preset

Install payload

Present

Install payload is available.

Required in this preset

Required evidence gates are covered for this preset.

Decision timeline

Decision timeline · balanced

5/6 steps complete with no blocking gaps for this preset.

Risk 14

triage

Confirm source provenanceRequired

Source/provenance metadata is available.

Done

triage

Check metadata review statusRequired

Review metadata is available.

Done

verify

Review safety notesRequired

Safety notes are available.

Done

verify

Review privacy notes

Privacy notes are available.

Done

verify

Validate package integrity metadata

Package integrity metadata is missing.

Pending

rollout

Verify install payload and commandsRequired

Install payload is available.

Done

No required blockers for this timeline preset.

Prerequisite readiness

4 prerequisites to line up before setup.

0/4 ready

Install & runtime1Permissions & scopes1General215 minutes

Safety & privacy surface

3 safety and 3 privacy notes across 5 risk areas. Review closely: credentials & tokens, network access.

5 areas

SafetyNetwork accessTests can mutate databases, send network requests, write files, consume cloud quota, or contact sandboxed services; coding agents must use isolated test environments and documented safe commands.
SafetyLocal filesPassing one focused test does not prove unrelated risky paths are safe; require broader checks for auth, data deletion, payments, migrations, release automation, and generated artifacts.
SafetyExecution & processesTreat agent-generated test summaries as untrusted until the reviewer can inspect the exact command, status check, log, or reproducible manual verification evidence.
PrivacyCredentials & tokensTest logs, snapshots, fixtures, screenshots, coverage reports, and CI artifacts can expose secrets, customer data, internal hostnames, private package names, prompts, or file paths.
PrivacyData retentionDo not paste raw private CI logs, database rows, browser traces, screenshots, or failing payloads into public PR comments without redaction.
PrivacyCredentials & tokensWhen a test needs private credentials or data, record the approved internal verification channel rather than copying sensitive details into the public PR.

Safety notes

Tests can mutate databases, send network requests, write files, consume cloud quota, or contact sandboxed services; coding agents must use isolated test environments and documented safe commands.
Passing one focused test does not prove unrelated risky paths are safe; require broader checks for auth, data deletion, payments, migrations, release automation, and generated artifacts.
Treat agent-generated test summaries as untrusted until the reviewer can inspect the exact command, status check, log, or reproducible manual verification evidence.

Privacy notes

Test logs, snapshots, fixtures, screenshots, coverage reports, and CI artifacts can expose secrets, customer data, internal hostnames, private package names, prompts, or file paths.
Do not paste raw private CI logs, database rows, browser traces, screenshots, or failing payloads into public PR comments without redaction.
When a test needs private credentials or data, record the approved internal verification channel rather than copying sensitive details into the public PR.

Prerequisites

A pull request, branch, or patch created or modified by a coding agent.
Access to the repository's documented test commands, CI status checks, and branch-protection rules.
Enough diff context to identify touched behavior, affected packages, generated files, and risky runtime paths.
Permission to request changes or block merge when test evidence is stale, incomplete, unsafe, or unverifiable.

Schema details

Install type: copy
Reading time: 6 min
Difficulty score: 42
Troubleshooting: Yes
Breaking changes: No

Collection metadata

Estimated setup: 15 minutes
Difficulty: intermediate

Full copyable content

You are deciding whether a coding-agent pull request has enough test evidence
to merge.

Rules:
1. Map every changed behavior to a focused test, CI check, or documented
   manual verification step.
2. Require exact commands, environments, commit SHAs, and pass/fail results;
   do not accept agent summaries as proof.
3. Re-run or refresh checks after rebases, force pushes, dependency changes,
   generated artifacts, or test-fix commits.
4. Block merge when tests are skipped, stale, unrelated, flaky without
   triage, or run against unsafe production-like targets.
5. Keep logs and fixtures privacy-safe before pasting evidence into public
   pull request comments.

About this resource

Purpose

Use these rules when a coding agent opens, edits, or summarizes a pull request and the reviewer needs to decide whether the branch has enough test evidence to merge.

The central rule is simple: a coding agent may propose the patch, but the pull request must contain reviewer-visible evidence that the changed behavior was verified on the reviewed commit. A confident agent summary is not a substitute for tests, CI checks, command output, or a documented manual verification path.

Evidence Classes

Classify the needed evidence before approving the pull request.

Focused unit test. Pure functions, parsers, validators, reducers, and small business rules should have targeted tests that fail without the fix.
Integration test. API handlers, database access, queues, permissions, framework wiring, CLI commands, and third-party clients need tests across the boundary that changed.
End-to-end or smoke test. User flows, routing, auth, payments, browser behavior, deployment wiring, and release paths need a higher-level check or a clearly documented manual verification step.
Static check. Type checking, linting, formatting, code scanning, and generated-schema checks are useful evidence, but they do not replace runtime tests for behavior changes.
Required status check. CI status checks and protected-branch rules matter only when they ran on the reviewed head commit and cover the affected code.
Manual verification. Use manual checks only when automation is missing, expensive, or impossible; record the exact environment, steps, result, and reviewer who accepted the gap.

If the PR changes multiple behavior classes, require evidence for each class instead of accepting a single broad "tests passed" statement.

Required Test Evidence

A coding-agent PR should make test evidence easy to audit.

The PR names the exact commands, CI jobs, or manual verification steps used.
Test results are tied to the reviewed commit SHA, not an older push or local state.
Focused tests cover the changed behavior, failing case, or regression that motivated the patch.
CI status checks are current after rebases, force pushes, dependency updates, generated-file changes, and test-only follow-up commits.
Skipped tests, quarantined tests, retries, and flaky failures have a written explanation and owner decision.
Generated artifacts, lockfiles, migrations, clients, schemas, and snapshots have a regeneration or validation command.
Manual verification avoids production data and records enough detail for a reviewer to reproduce the result safely.

Do not approve when the evidence is only a natural-language summary from the agent. Ask for the command, job link, log excerpt, or reviewer-owned manual verification note.

Agent Rules

Read the repository's test documentation before choosing commands.
Prefer focused commands first, then broader package or CI checks when the touched behavior crosses boundaries.
Use framework-supported focused test options when available, such as pytest node IDs, Jest related tests, or Vitest file/test-name filtering.
Report commands exactly as run, including package manager, workspace, flags, environment name, and any skipped tests.
Stop and ask for review when tests require credentials, production-like data, paid services, destructive fixtures, or non-isolated infrastructure.
Never invent passing results, hide failing output, or convert a failed test into a "known issue" without maintainer approval.
Re-run relevant checks after every change that can alter behavior, generated output, dependency resolution, or test selection.

Reviewer Rules

Start by mapping files changed to behavior changed.
Compare the agent's summary with the actual diff and status checks.
Confirm that required checks ran on the latest commit.
Require a focused regression test for bug fixes unless the project lacks a practical test harness and the manual verification note explains why.
Ask for broader checks when the PR touches auth, data deletion, billing, migrations, concurrency, dependency manifests, release automation, or public API behavior.
Treat screenshots as supporting evidence, not proof that the underlying behavior is tested.
Keep public PR evidence synthetic and redacted when logs or fixtures contain private data.

Merge Blockers

Block merge until resolved when:

the PR claims tests passed but does not name the command, CI job, or commit;
tests ran before a rebase, force push, dependency update, generated artifact, or final patch commit;
the test command is unrelated to the changed package, path, or behavior;
a risky behavior change has only lint, formatting, or type-check evidence;
failing, skipped, quarantined, or flaky tests are not triaged;
manual verification uses production data, real payments, live credentials, or destructive targets without explicit approval;
generated files, snapshots, schemas, clients, or migrations changed without a regeneration or validation command;
logs, screenshots, traces, fixtures, or coverage artifacts expose private user data or secrets.

Review Checklist

{"task": "Behavior mapped", "description": "Each meaningful code change is mapped to the behavior it can affect"}
{"task": "Commands exact", "description": "The PR names exact test commands, CI jobs, commit SHA, environment, and pass/fail results"}
{"task": "Checks are fresh", "description": "Tests and required status checks ran after the latest branch update"}
{"task": "Risk paths covered", "description": "Auth, data, payments, migrations, dependencies, release automation, and public API changes have focused evidence"}
{"task": "Failures triaged", "description": "Skipped, flaky, failed, or quarantined tests have an owner-approved explanation"}
{"task": "Privacy safe", "description": "Logs, fixtures, screenshots, and manual verification notes avoid secrets and private data"}

Troubleshooting

The agent cannot run tests locally: require current CI, a maintainer-run command, or a manual verification note that explains the limitation.
Only broad CI exists: add a focused regression test when the change fixes a bug or touches a risky path; otherwise document why broad CI is adequate.
The test is flaky: do not merge by retrying until green. Capture the flaky signal, identify an owner, and decide whether the PR should fix or isolate it.
The branch changed after approval: refresh relevant checks and review the delta before merging.
Logs contain private data: redact or move evidence to an approved private channel and leave a minimal public note that verification occurred.

Duplicate And History Check

Checked existing rules, guides, hooks, commands, skills, open PRs, and closed PR history for test-before-merge rules, coding-agent test evidence, AI-generated code review, TDD rules, status checks, test-runner hooks, and CI validation.

Adjacent content includes the AI-generated-code-before-merge guide, the test-driven-development enforcer rule, code-test-runner and Playwright test hooks, and high-risk code review escalation rules. This entry is distinct because it is a portable rules policy for coding-agent PRs: it decides what test evidence must exist before merge, how agents should report commands, when reviewers should refresh stale checks, and what privacy constraints apply to test logs and fixtures.

Checks vs Commit Statuses

GitHub exposes two types of status checks, and they carry different evidence weight. Per GitHub's "About status checks" documentation, there are two types of status checks: checks and commit statuses.

Attribute	Checks	Commit statuses
Line annotations	Provide line annotations	No line annotations
Detail level	More detailed messaging	Less detailed
Creator scope	Only available for use with GitHub Apps	Created with GitHub's API by org owners and users with push access
GitHub Actions output	GitHub Actions generates checks, not commit statuses	Not generated by GitHub Actions
Checks tab	Populates the Checks tab	Does not populate the Checks tab

When a coding agent claims "CI passed," prefer evidence from checks over bare commit statuses: when a specific line in a commit causes a check to fail, the failure detail appears next to the relevant code in the Files tab of the pull request. If status checks are required for the repository, they must pass before you can merge into the protected branch — so confirm required checks are green on the reviewed head commit, not an earlier push.

Sources

GitHub Docs: About status checks - https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/collaborating-on-repositories-with-code-quality-features/about-status-checks
GitHub Docs: About protected branches - https://docs.github.com/en/repositories/configuring-branches-and-merges-in-your-repository/managing-protected-branches/about-protected-branches
GitHub Docs: Review output from Copilot - https://docs.github.com/en/copilot/how-tos/copilot-on-github/use-copilot-agents/review-copilot-output
Google Engineering Practices: What to look for in a code review - https://google.github.io/eng-practices/review/reviewer/looking-for.html
pytest documentation: Usage and invocations - https://docs.pytest.org/en/stable/how-to/usage.html
Jest CLI options - https://jestjs.io/docs/cli
Vitest CLI guide - https://vitest.dev/guide/cli

#testing #coding-agents #pull-requests #ci #status-checks #merge-safety

Source citations

Source methodology →

Add this badge to your README

Show that Coding Agent Test-Before-Merge Rules is listed on HeyClaude. Paste this Markdown into your README — it renders the badge and links back to this page.

[![Listed on HeyClaude](https://heyclau.de/badge/rules/coding-agent-test-before-merge-rules.svg)](https://heyclau.de/entry/rules/coding-agent-test-before-merge-rules)

How it compares

Coding Agent Test-Before-Merge Rules side by side with 2 alternatives on trust, install, platform support, and disclosed safety notes — all from reviewed registry metadata.

Field	Coding Agent Test-Before-Merge Rules Source-backed rules for requiring coding agents to provide fresh, scoped, and reviewer-visible test evidence before a pull request can be approved or merged. Open dossier	Dependency Update Review Rules Source-backed rules for reviewing dependency update pull requests with supply-chain context, lockfile discipline, advisory checks, compatibility evidence, and privacy-safe metadata handling. Open dossier	Content-Only Submission PR Gate Rules Source-backed rules for preparing direct content-only pull requests with one raw MDX file, reachable provenance URLs, issue closure, duplicate history, validation evidence, and no generated artifact churn. Open dossier
Next steps	Open dossier API JSON Open LLM Open source Newsletter Claim listing	Open dossier API JSON Open LLM Open source Newsletter Claim listing	Open dossier API JSON Open LLM Open source Newsletter Claim listing
Trust
Review status	ReviewedMaintainer reviewed	ReviewedMaintainer reviewed	ReviewedMaintainer reviewed
Package trust	Package not verified	Package not verified	Package not verified
Source provenance	Source-backed	Source-backed	Source-backed
Submitter	MkDev11	MkDev11	MkDev11
Install risk	Review first	Review first	Review first
Notes	Safety ✓ Privacy ✓	Safety ✓ Privacy ✓	Safety ✓ Privacy ✓
Brand	—	—	—
Category	rules	rules	rules
Source	Source-backed	Source-backed	Source-backed
Author	MkDev11	MkDev11	MkDev11
Added	2026-06-04	2026-06-04	2026-06-04
Platforms	Claude Code	Claude Code	Claude Code
Harness	Claude Code	Claude Code	Claude Code
Source repo	—	—	—
Safety notes	✓Tests can mutate databases, send network requests, write files, consume cloud quota, or contact sandboxed services; coding agents must use isolated test environments and documented safe commands. Passing one focused test does not prove unrelated risky paths are safe; require broader checks for auth, data deletion, payments, migrations, release automation, and generated artifacts. Treat agent-generated test summaries as untrusted until the reviewer can inspect the exact command, status check, log, or reproducible manual verification evidence.	✓Dependency updates can change runtime code, install scripts, build plugins, transitive packages, lockfile integrity hashes, generated clients, and deployment behavior. Security updates should be prioritized, but they still need compatibility evidence, a rollback path, and review of newly introduced transitive dependencies. Treat major version upgrades, package-manager changes, new registries, git dependencies, postinstall scripts, and build-tool updates as high-risk until reviewed.	✓These rules do not run content or automate submissions; they are a review gate for keeping direct content PRs focused and verifiable. Do not run generators, package builds, download packaging, README refreshes, or maintainer automation while preparing a content-only PR unless the issue explicitly requires it. If the submitted entry describes hooks, MCP servers, shell commands, installers, credentialed tools, external writes, or destructive actions, require specific safety notes for that entry before opening the PR.
Privacy notes	✓Test logs, snapshots, fixtures, screenshots, coverage reports, and CI artifacts can expose secrets, customer data, internal hostnames, private package names, prompts, or file paths. Do not paste raw private CI logs, database rows, browser traces, screenshots, or failing payloads into public PR comments without redaction. When a test needs private credentials or data, record the approved internal verification channel rather than copying sensitive details into the public PR.	✓Dependency names, versions, private registry hosts, scoped package names, repository URLs, and advisory matches can reveal proprietary architecture or unreleased product plans. Do not paste full private lockfiles, internal package metadata, paid registry URLs, or vulnerability details into public PR comments. External advisory and package-health services should only receive public package identifiers, or private identifiers after explicit approval.	✓PR bodies, source URLs, issue links, duplicate-search notes, screenshots, and validation logs are public review artifacts. Do not use private repositories, customer names, internal docs, local paths, account identifiers, tokens, screenshots with secrets, or unpublished data as public source evidence. When validation output or duplicate evidence contains private details, summarize the verification class publicly and keep sensitive evidence in the approved private channel.
Prerequisites	A pull request, branch, or patch created or modified by a coding agent. Access to the repository's documented test commands, CI status checks, and branch-protection rules. Enough diff context to identify touched behavior, affected packages, generated files, and risky runtime paths. Permission to request changes or block merge when test evidence is stale, incomplete, unsafe, or unverifiable.	A dependency update pull request with manifest, lockfile, and package-manager context. Access to current CI, test results, dependency review output, advisory alerts, and release notes or changelogs for the updated packages. A project policy for supported runtimes, package managers, registries, license constraints, and emergency security updates. Permission to split, defer, or block dependency updates when provenance, compatibility, or privacy evidence is incomplete.	A selected content slot issue, target category, proposed title, and proposed slug before creating the branch. Public source URLs that load successfully and directly support the entry's claims. A duplicate-search pass across existing content, live PRs, closed PRs, source domains, docs URLs, package URLs, provider names, and aliases. A local checkout where category validation, content policy validation, and whitespace checks can run before the branch is pushed.
Install	—	—	—
Config	—	—	—
Citations	Source repositorygithub.com 2026-07-19T11:20:19-07:00 Documentationdocs.github.com Submitted by MkDev112026-06-04 Source methodology →	Source repositorygithub.com 2026-07-19T11:20:19-07:00 Documentationdocs.github.com Submitted by MkDev112026-06-04 Source methodology →	Source repositorygithub.com 2026-07-19T11:20:19-07:00 Documentationgithub.com Submitted by MkDev112026-06-04 Source methodology →
Claim	Unclaimed	Unclaimed	Unclaimed

Open 3 picks in the interactive comparison tool

Signals

Loading live community signals…

Citation facts

Review trust signals before you adopt

Source and provenance checks

Safety and privacy checks

Package and install checks

Compare-driven decision checks

Copy & paste

Balanced adoption plan

Pre-adoption checks

Security checks

Rollout

Evidence readiness matrix · balanced

Source provenance

Metadata review

Safety notes

Privacy notes

Package integrity

Install payload

Decision timeline · balanced

Confirm source provenanceRequired

Check metadata review statusRequired

Review safety notesRequired

Review privacy notes

Validate package integrity metadata

Verify install payload and commandsRequired

Prerequisite readiness

Safety & privacy surface

Safety notes

Privacy notes

Prerequisites

Schema details

About this resource

Purpose

Evidence Classes

Required Test Evidence

Agent Rules

Reviewer Rules

Merge Blockers

Review Checklist

Troubleshooting

Duplicate And History Check

Checks vs Commit Statuses

Sources

Source citations

Add this badge to your README

How it compares

Related resources

Dependency Update Review Rules

Content-Only Submission PR Gate Rules

Pull Request Triage Capability Pack Skill

Code Review Automation Capability Pack Skill

Related guides

Review AI-Generated Code Before Merge

Claude Code GitHub Actions Review Workflow

Claude Code GitLab CI/CD Workflow

Signals