rulesSource-backedReview first Safety ✓ Privacy ✓

Test Double Misuse Review Rules

Source-backed rules for reviewing test code for test-double misuse, covering over-mocking that decouples tests from real behavior, under-mocking that creates slow or flaky tests, mock-return-value drift, missing contract tests for faked dependencies, and keeping test data free of personal information.

by jaso0n0818·added 2026-06-19·

Claude Code

HarnessClaude Code

Install

Source

You are reviewing test code for test-double misuse.

Rules:
1. Mock only what is slow, non-deterministic, or crosses a process boundary;
   do not mock the thing under test or its direct collaborators when a real
   or in-process fake is practical.
2. Keep mock return values truthful to what the real dependency returns; stale
   or invented return values let tests pass while real integration breaks.
3. Verify interactions (calls, arguments, call count) only when the interaction
   itself is the behavior under test, not as a default assertion style.
4. Back every widely faked dependency with at least one contract or integration
   test that runs against the real collaborator so drift is caught.
5. Prefer stubs and fakes over mocks for simple value-returning collaborators;
   reserve mocks for verifying side-effect interactions.
6. Keep test fixture data synthetic; do not copy production records or personal
   data into test setups.

Readiness

TrustReview first
Sourcesource-backed
Safety notesPresent
ReviewedYes

Documentation Source repository Registry JSON · LLM text

Review first — review before installing

Open the source and read safety notes before installing.

Safety notes

Tests that mock the system under test or its direct collaborators can pass confidently while the real wiring is broken, giving false assurance that hides regressions.
Mocking a dependency that has changed its behavior causes tests to pass against a stale contract, delaying discovery of integration failures until production.
Removing a mock without adding a real dependency can make a test suite unexpectedly slow or flaky if the dependency is an external service or database.

Privacy notes

Test fixtures and mock return values sometimes copy production data including personal identifiers, emails, phone numbers, and health records.
Do not paste real user records or production database rows into test setups or mock return values; use synthetic, obviously-fake data.
Be careful with snapshot and fixture files that may contain personal data captured from a staging or production environment.

Prerequisites

A pull request or diff that adds, edits, or removes test doubles in unit, integration, or component tests.
Knowledge of the test framework and mocking library in use, since stub, mock, spy, and fake semantics differ between libraries.
Awareness of whether a contract or integration test already covers the faked dependency, or whether the PR should add one.
Permission to block merge when a test double decouples the test from real behavior in a way that masks breakage.

Schema details

Install type: copy
Troubleshooting: Yes

Collection metadata

Estimated setup: 20 minutes
Difficulty: intermediate

Full copyable content

You are reviewing test code for test-double misuse.

Rules:
1. Mock only what is slow, non-deterministic, or crosses a process boundary;
   do not mock the thing under test or its direct collaborators when a real
   or in-process fake is practical.
2. Keep mock return values truthful to what the real dependency returns; stale
   or invented return values let tests pass while real integration breaks.
3. Verify interactions (calls, arguments, call count) only when the interaction
   itself is the behavior under test, not as a default assertion style.
4. Back every widely faked dependency with at least one contract or integration
   test that runs against the real collaborator so drift is caught.
5. Prefer stubs and fakes over mocks for simple value-returning collaborators;
   reserve mocks for verifying side-effect interactions.
6. Keep test fixture data synthetic; do not copy production records or personal
   data into test setups.

About this resource

Purpose

Use these rules when a change adds or edits test doubles. The goal is to keep test doubles honest — catching real defects without giving false confidence when the production behavior drifts away from what the mock returns.

This is a review policy, not a mocking-library tutorial. It tells reviewers what must be true about a test double's scope, return values, interaction assertions, and contract coverage before the change is safe to merge.

Review Inputs

Collect enough context to know what is being faked and why.

Subject and collaborators. Which class, function, or service is under test and which of its collaborators are being replaced by test doubles.
Double type. Whether the double is a stub (returns a value), a mock (verifies calls), a spy (records calls), a fake (simplified implementation), or a dummy (unused argument filler).
Return-value accuracy. Whether the values the stub or mock returns match what the real dependency actually returns.
Interaction assertions. Whether the test verifies calls as a proxy for the real observable behavior or because the interaction itself matters.
Contract coverage. Whether any test runs against the real dependency to catch drift in the mocked return values.

If the change cannot say what is under test and why the collaborator is faked rather than used directly, require that context before reviewing the assertions.

Scope And Selection Rules

Mock collaborators that are slow, non-deterministic, or cross a process or network boundary; do not mock the thing under test.
Use the real collaborator or an in-process fake when it is fast, deterministic, and available in the test environment.
Prefer a stub or fake over a mock for value-returning collaborators where the interaction is not the point of the test.
Reserve interaction verification (mock expectations) for side-effect behavior where the call itself is the observable outcome.
Do not mock simple utilities, pure functions, or value objects; test them directly instead.

Return-Value Accuracy Rules

Keep stub and mock return values consistent with what the real dependency actually returns for the same input.
Update return values when the dependency's interface or behavior changes, not only when tests fail.
Use a recorded or generated fixture derived from the real response when the return value is complex, rather than a hand-invented one.
Avoid returning simplified or convenient values that the real dependency never produces, since tests then pass cases that would fail in production.
Flag return values that omit required fields or use types the real dependency does not produce.

Interaction Assertion Rules

Assert interactions only when the call, argument, or call count is itself the behavior under test.
Do not assert every call as a default style; over-assertion makes tests fragile to refactoring without catching more real defects.
Prefer asserting observable state or return values over asserting that collaborators were called in a specific order.
Keep interaction assertions stable; if they break on every refactor, they are probably testing implementation rather than behavior.
When a side-effect interaction must be verified, use the narrowest assertion that proves the behavior.

Contract And Integration Coverage Rules

Widely faked dependencies create a gap between the test suite and production. Close it with at least one test that uses the real thing.

Add a contract or integration test for any collaborator that is faked in many test files, so interface drift is caught before it reaches production.
Run at least one integration-level test against the real database, HTTP service, or message bus when unit tests fake those boundaries.
Use consumer-driven contract tests when the dependency is an external service maintained by another team.
Update the contract test when the dependency's interface changes, not only the unit test mock.
Make the gap explicit: if no contract test exists and none can be added, record why and what manual verification happens instead.

Merge Blockers

Block merge until resolved when:

the thing under test itself is mocked, since the test cannot then catch a defect in the subject;
a fast, deterministic, in-process collaborator is mocked without justification;
mock return values are obviously stale or invented values the real dependency never produces;
interaction assertions verify every call by default rather than only side effects that are the actual behavior;
a widely faked dependency has no contract or integration test and none is planned;
test fixture data contains real personal data copied from production or staging.

Review Checklist

{"task": "Subject identified", "description": "The change is clear about what is under test and why each collaborator is faked"}
{"task": "Double type appropriate", "description": "Stub, fake, or mock is chosen for the right reason; real collaborators are used when practical"}
{"task": "Return values accurate", "description": "Stub and mock values match what the real dependency produces"}
{"task": "Assertions focused", "description": "Interaction assertions verify side-effect behavior, not every internal call"}
{"task": "Contract coverage", "description": "Widely faked dependencies have at least one test against the real collaborator"}
{"task": "Privacy safe", "description": "Fixture data is synthetic and does not include real personal data"}

AI Review Rules

AI assistants can review test code, but they should show evidence.

Ask the assistant to identify what is under test and what is faked before judging the assertions.
Require it to flag any mock return value that differs from the real dependency's known behavior.
Have the assistant check whether a contract or integration test backs the faked dependency.
Do not let the assistant assume all mocks are correct because the test passes.
Re-run review after changes to mock return values, interaction assertions, or the dependency's real interface.

Troubleshooting

Tests pass but integration breaks: add a contract or integration test against the real dependency and update the mock return values.
Tests break on every refactor: replace interaction assertions with observable-state or return-value assertions.
A mock returns a value the real thing never produces: update the return value or generate it from a real response fixture.
A fast utility is mocked unnecessarily: remove the mock and use the real implementation.
Fixture data contains personal data: replace with synthetic data and scrub any snapshot files that captured real records.

Duplicate And History Check

Checked existing rules, hooks, statuslines, guides, collections, skills, open PRs, and closed PRs for mock misuse, test doubles, over-mocking, stub accuracy, contract testing, and test-quality review rules.

Adjacent content includes the test-driven-development-enforcer rule and vitest-expert rule, but no entry is a portable pre-merge review policy specifically for test-double selection, return-value accuracy, and contract coverage. This entry is distinct because it decides what must be true about a test double's scope, accuracy, and contract backing before the change can merge.

No prior closed PR for this rule was found during the duplicate/history check.

Test Double Type Reference

The types below follow the taxonomy described in the sources. Choosing the right type and using it correctly is what makes the test meaningful.

Type	What it does	When to use it
Dummy	Fills a parameter that is not used	Satisfying an interface requirement only
Stub	Returns a fixed value	Providing indirect input to the subject
Spy	Records calls for later assertion	When you need to verify a call happened
Mock	Pre-programmed with expectations	Verifying a side-effect interaction
Fake	Simplified working implementation	Replacing a real but heavy dependency

Mocks are the most powerful and the most dangerous: over-used, they verify implementation instead of behavior and allow return-value drift. Stubs and fakes give simpler, more stable tests when the interaction itself is not what matters.

Sources

Martin Fowler — Test Double: https://martinfowler.com/bliki/TestDouble.html
Martin Fowler — Practical Test Pyramid: https://martinfowler.com/articles/practical-test-pyramid.html
Martin Fowler — Testing Strategies in a Microservice Architecture: https://martinfowler.com/articles/microservice-testing/
Test double (Wikipedia): https://en.wikipedia.org/wiki/Test_double

#testing #mocks #test-doubles #reliability #code-review #test-quality

Source citations

Add this badge to your README

Show that Test Double Misuse Review Rules is listed on HeyClaude. Paste this Markdown into your README — it renders the badge and links back to this page.

[![Listed on HeyClaude](https://heyclau.de/badge/rules/test-double-misuse-review-rules.svg)](https://heyclau.de/entry/rules/test-double-misuse-review-rules)

How it compares

Test Double Misuse Review Rules side by side with 3 alternatives on trust, install, platform support, and disclosed safety notes — all from reviewed registry metadata.

Field	Test Double Misuse Review Rules Source-backed rules for reviewing test code for test-double misuse, covering over-mocking that decouples tests from real behavior, under-mocking that creates slow or flaky tests, mock-return-value drift, missing contract tests for faked dependencies, and keeping test data free of personal information. Open dossier	Event Sourcing Implementation Review Rules Source-backed rules for reviewing event-sourcing implementation changes, covering immutable event design, event schema evolution without breaking projections, idempotent event handlers, snapshot and replay correctness, and consistent event-store access patterns. Open dossier	Feature Flag Lifecycle Review Rules Source-backed rules for reviewing feature flag changes across their full lifecycle, covering flag creation, naming, default values, kill switches, targeting, rollout safety, cleanup of stale flags, and privacy-safe configuration evidence. Open dossier	Structured Logging Review Rules Source-backed rules for reviewing application logging changes, covering structured machine-readable events, consistent levels, correlation and trace context, actionable messages, log volume and cost, and keeping secrets and personal data out of logs. Open dossier
Trust
Install risk	Review first	Review first	Review first	Review first
Notes	Safety ✓ Privacy ✓	Safety ✓ Privacy ✓	Safety ✓ Privacy ✓	Safety ✓ Privacy ✓
Brand	—	—	—	—
Category	rules	rules	rules	rules
Source	source-backed	source-backed	source-backed	source-backed
Author	jaso0n0818	jaso0n0818	jaso0n0818	jaso0n0818
Added	2026-06-19	2026-06-19	2026-06-19	2026-06-19
Platforms	Claude Code	Claude Code	Claude Code	Claude Code
Source repo	—	—	—	—
Safety notes	✓Tests that mock the system under test or its direct collaborators can pass confidently while the real wiring is broken, giving false assurance that hides regressions. Mocking a dependency that has changed its behavior causes tests to pass against a stale contract, delaying discovery of integration failures until production. Removing a mock without adding a real dependency can make a test suite unexpectedly slow or flaky if the dependency is an external service or database.	✓Mutating or deleting a persisted event corrupts the audit log and breaks replays that depend on the original event sequence, potentially causing unrecoverable inconsistency. A non-backward-compatible event schema change can break all running projections that read older events, causing data loss in read models or replay failures. A non-idempotent event handler can apply the same event twice on replay or redelivery, producing incorrect aggregate state that is invisible in normal operation but surfaces during recovery.	✓A missing or unsafe default value can turn a flag-service outage into a production incident when evaluation falls back to the wrong path. Percentage and targeting rollouts can expose unfinished behavior to real users; risky flags need a kill switch and rollback path that is tested before wide release. Stale flags accumulate as technical debt and can silently re-enable or disable behavior long after the original release, so removal should delete both the flag and its dead code path.	✓Logging in a hot path or inside a tight loop can dominate latency and overwhelm the log pipeline, turning observability into a performance and availability problem. High-cardinality fields and large payloads can explode log index size and cost, and can trip ingestion limits that drop later, more important logs. Removing or downgrading logs that incident responders rely on can make a future outage much harder to diagnose, so changes to error and audit logs deserve extra scrutiny.
Privacy notes	✓Test fixtures and mock return values sometimes copy production data including personal identifiers, emails, phone numbers, and health records. Do not paste real user records or production database rows into test setups or mock return values; use synthetic, obviously-fake data. Be careful with snapshot and fixture files that may contain personal data captured from a staging or production environment.	✓Event stores often contain an immutable history of personal data; deletion requests for GDPR or similar regulations must be handled by encryption-key rotation or event compaction, not by deleting events. Do not include sensitive personal data (passwords, payment card details, health records) directly in event payloads; reference identifiers and look up sensitive data separately. Be careful with event replay in non-production environments that copy the production event log, since they inherit all personal data in the event history.	✓Flag targeting can reference user identifiers, email domains, plan tiers, regions, cohorts, and other attributes that are sensitive when shared. Do not paste raw targeting rules, segment definitions, user lists, or flag-evaluation context into public PR comments without redaction. Use synthetic users and test segments when demonstrating targeting, especially for auth, billing, healthcare, or other regulated flows.	✓Logs frequently capture access tokens, passwords, API keys, session ids, full request and response bodies, emails, and other personal data when developers log whole objects. Do not paste raw log lines containing secrets or personal data into public PR comments; redact them and prefer synthetic examples. Be careful with logging frameworks that serialize entire objects, since they can silently copy sensitive fields into logs and downstream log stores.
Prerequisites	A pull request or diff that adds, edits, or removes test doubles in unit, integration, or component tests. Knowledge of the test framework and mocking library in use, since stub, mock, spy, and fake semantics differ between libraries. Awareness of whether a contract or integration test already covers the faked dependency, or whether the PR should add one. Permission to block merge when a test double decouples the test from real behavior in a way that masks breakage.	A pull request or diff that adds or edits event types, handlers, projections, aggregates, snapshots, or event-store queries in an event-sourced system. Knowledge of the event store and framework in use, since stream naming, optimistic concurrency, and replay semantics differ between implementations. Access to a test environment where event replay, schema evolution, and snapshot consistency can be exercised without corrupting the production event log. Permission to block merge when an event schema change breaks existing projections, a handler is not idempotent, or aggregate state diverges between load paths.	A pull request, configuration change, or flag management entry that adds, edits, targets, rolls out, or removes a feature flag. Access to the project's flag management system or configuration source, flag naming conventions, and the owning team for the affected flags. A non-production environment or test project where targeting and default-value behavior can be exercised without affecting real users. Permission to block merge when a flag lacks an owner, a safe default, a removal plan, or a kill switch for risky behavior.	A pull request or diff that adds, edits, or removes log statements, or that changes the logging library, format, levels, or configuration. Knowledge of the logging stack in use, since structured output, level semantics, and context propagation differ between libraries and runtimes. Awareness of where logs are shipped and retained, since volume, cost, and privacy exposure depend on the downstream log platform. Permission to block merge when a change logs secrets or personal data, floods a hot path, or removes context needed to debug incidents.
Install	—	—	—	—
Config	—	—	—	—
Citations	Source repositorygithub.com 2026-06-19T00:43:02-07:00 Documentationmartinfowler.com Submitted by jaso0n08182026-06-19	Source repositorygithub.com 2026-06-19T00:43:02-07:00 Documentationmartinfowler.com Submitted by jaso0n08182026-06-19	Source repositorygithub.com 2026-06-19T00:43:02-07:00 Documentationmartinfowler.com Submitted by jaso0n08182026-06-19	Source repositorygithub.com 2026-06-19T00:43:02-07:00 Documentation12factor.net Submitted by jaso0n08182026-06-19
Claim	Unclaimed	Unclaimed	Unclaimed	Unclaimed

Signals

Loading live community signals…