bazel-test-hygiene
$
npx mdskill add cloudflare/workerd/bazel-test-hygieneEnforce strict, reliable execution of all Bazel tests to prevent stale results and false positives.
- Prevents developers from trusting outdated test outcomes due to Bazel's action caching.
- Requires running the complete test suite rather than using partial or filtered test arguments.
- Ensures comprehensive validation by mandating the use of the `--nocache_test_results` flag.
- Provides clear guidelines on executing tests to guarantee accurate assessment of code changes.
SKILL.md
.github/skills/bazel-test-hygieneView on GitHub ↗
--- name: bazel-test-hygiene description: Mandatory rules for running bazel tests during development. Load this skill before running any bazel test command, especially when validating fixes or verifying regression tests. Prevents false confidence from cached results, filter flags that silently match nothing, and partial test runs that miss breakage. --- # Bazel Test Hygiene ## The Three Rules ### 1. Always disable caching ```bash bazel test //... --nocache_test_results ``` **Why:** Bazel's action cache can serve stale test binaries even after you edit source files. Without `--nocache_test_results`, you may be running the OLD binary and seeing OLD results. This is not hypothetical — it has caused real false-positive/false-negative confusion in this repo. **Always include `--nocache_test_results`.** No exceptions. ### 2. Keep it simple — no filter flags Do NOT use `--test_arg='-f'` or similar filter flags to run individual test cases. **Why:** KJ test's `-f` flag silently passes when zero tests match. If you typo the filter or the test name changes, bazel reports "PASSED" with zero tests actually run. This gives completely false confidence. **Run the full test target.** If you need to check a specific test, look for its name in the full output. If the full suite is too slow, run the specific test _target_ (e.g., `//src/workerd/api:streams/standard-test@`), not a filtered subset within a target. ### 3. Run the full suite before claiming done A single test target passing does not mean you haven't broken something else. Fixes to shared code (queue.c++, standard.c++, common.h) can break tests in completely different directories. **Before claiming any fix is complete:** ```bash bazel test //... --nocache_test_results ``` Check the final summary line: `Executed N out of N tests: N tests pass.` All N must match. If any test fails, the fix is not done. ## Red-Green Verification for Regression Tests When writing a regression test for a bug fix, you MUST verify the test actually catches the bug: 1. **Green:** Run `bazel test //... --nocache_test_results` — all tests pass (fix in place) 2. **Red:** Remove the fix, run `bazel test //... --nocache_test_results` — the new test(s) MUST fail 3. **Green:** Restore the fix, run `bazel test //... --nocache_test_results` — all tests pass again If step 2 passes (test doesn't fail without the fix), the test is not testing what you think. Go back and fix the test. **Do the red-green on the full suite**, not just the one target. This catches two problems at once: (a) the regression test actually detects the bug, and (b) the fix doesn't break anything else. ## Anti-Patterns | Don't | Do instead | | ----------------------------------------- | --------------------------------------------------- | | `bazel test //target` (no cache flag) | `bazel test //target --nocache_test_results` | | `--test_arg='-f' --test_arg='test name'` | Run the full target, grep output for test name | | Run one target, claim fix is done | Run `//...`, check all-pass summary | | Claim "tests pass" from a previous run | Run fresh, read fresh output | | Trust filter-based "PASSED" at face value | Check that the expected test names appear in output |
More from cloudflare/workerd
- add-autogateStep-by-step guide for adding a new autogate to workerd for gradual rollout of risky changes, including enum registration, string mapping, usage pattern, and testing.
- add-compat-flagStep-by-step guide for adding a new compatibility flag to workerd, including capnp schema, C++ usage, testing, and documentation requirements.
- commit-categoriesCommit categorization rules for changelogs and "what's new" summaries. MUST be loaded before categorizing commits in changelog or whats-new commands. Provides the canonical path-based category table used to group commits by area.
- dad-jokesAfter completing any task that took more than ~5 tool calls, or after long-running builds/tests finish, load this skill and deliver a dad joke to lighten the mood. Also load before any user-requested joke, pun, or limerick. Never improvise jokes without loading this skill first.
- find-and-run-testsHow to find, build, and run tests in workerd. Covers wd-test, kj_test target naming, bazel query patterns, and common flags. Also covers parent project integration tests if workerd is used as a submodule. Load this skill when you need to locate or run a test and aren't sure of the exact target name or invocation.
- identify-reviewerIdentifies the local user's GitHub account and git identity before performing code reviews. Load this skill at the start of any PR review, code review, or commit log analysis so findings can be framed relative to the user's own prior comments, commits, and approval status.
- investigation-notesStructured scratch tracking document for investigation state during bug hunts - prevents re-reading code, losing context, and rabbit holes; maintains external memory so you don't re-derive conclusions
- kj-styleKJ/workerd C++ style guidelines for code review. Covers naming, type usage, memory management, error handling, inheritance, constness, and formatting conventions. Load this skill when reviewing or writing C++ code in the workerd codebase.
- markdown-draftsUse markdown formatting when drafting content intended for external systems (GitHub issues/PRs, Jira tickets, wiki pages, design docs, etc.) so formatting is preserved when the user copies it. Load this skill before producing any draft the user will paste elsewhere.
- module-registryLoad when working with the module registry in workerd — reading, modifying, debugging, or reviewing module resolution, compilation, evaluation, or registration code. Provides pointers to three reference documents covering the legacy registry, V8 module internals, and the new registry design.