Skip to content

Add gcov-based test pruning with file-level coverage cache#1273

Draft
sbryngelson wants to merge 21 commits intoMFlowCode:masterfrom
sbryngelson:gcov-test-pruning
Draft

Add gcov-based test pruning with file-level coverage cache#1273
sbryngelson wants to merge 21 commits intoMFlowCode:masterfrom
sbryngelson:gcov-test-pruning

Conversation

@sbryngelson
Copy link
Member

@sbryngelson sbryngelson commented Feb 27, 2026

Summary

Add --only-changes flag to ./mfc.sh test that uses a gcov-based file-level coverage cache to skip tests unaffected by a PR's source changes. CI automatically uses --only-changes on PRs and rebuilds the cache when needed.

How it works

  1. Cache build (./mfc.sh build --gcov -j 8 && ./mfc.sh test --build-coverage-cache -j 64):

    • Phase 0: Prepare .inp files for all 555 tests
    • Phase 1: Run tests in parallel with per-test GCOV_PREFIX isolation
    • Phase 2: Collect gcov JSON to map each test → .fpp source files it exercises
    • Sanity check: rejects cache if zero tests have coverage data
  2. Test filtering (./mfc.sh test --only-changes -j 8):

    • git diff identifies changed .fpp files vs merge-base
    • Coverage cache determines which tests exercise those files
    • Only matching tests run; infrastructure/macro changes trigger full suite
    • Conservative fallbacks: stale/missing cache → full suite, git failure → full suite, unknown test UUID → include
  3. CI integration (.github/workflows/test.yml):

    • All test jobs pass --only-changes on PRs (Github runners, Phoenix, Frontier, Frontier AMD)
    • Master pushes always run the full suite (no --only-changes)
    • rebuild-cache job runs on Phoenix when triggered (PR only, not forks)
    • commit-cache job auto-commits the updated cache back to the PR branch
    • Cache rebuild failure does not block test jobs
  4. Cache rebuild triggers — the cache is rebuilt when either:

    • cases.py changes (new/modified test definitions)
    • Fortran dependency graph changes (added use, #:include, or include statements detected via diff grep)

Safety guards

  • ALWAYS_RUN_ALL: GPU macro files, CMakeLists.txt, case.fpp, toolchain/cmake/, coverage.py, toolchain definitions → full suite
  • PR-only pruning: --only-changes is gated on GITHUB_EVENT_NAME == pull_request in all CI scripts; master pushes run full suite
  • Shallow clone handling: CI fetches master with --deepen=200 for reliable git merge-base; falls back to origin/master if needed
  • Tests missing from cache (newly added) → conservatively included
  • Tests with no simulation coverage → conservatively included when simulation files change
  • Fork PRs: rebuild-cache runs but commit-cache is skipped (can't push to fork)
  • set -e in rebuild-cache.sh prevents committing a cache from a failed build
  • Missing binary warning during cache build for visibility

CMake changes for gcov builds (--gcov flag)

  • Override CMAKE_Fortran_FLAGS_RELEASE to -O1 -DNDEBUG (coverage is inaccurate at -O3)
  • Skip -march=native (AVX-512 FP16 on GNR emits instructions binutils 2.35 can't assemble)
  • Disable LTO/IPO (link failure with gcov-instrumented object files)
  • Gate --line-marker-format=gfortran5 Fypp flag behind MFC_GCov

CI-verified results

Scenario Expected Result
No .fpp changes All tests skipped (0-1s Test step) Confirmed
One .fpp file changed (m_bubbles.fpp) 118 tests run, 437 skipped Confirmed
Debug build with -% 20 23 tests (20% of 118) Confirmed
CMakeLists.txt changed Full suite (ALWAYS_RUN_ALL) Confirmed

Files changed

File Description
toolchain/mfc/test/coverage.py Core coverage logic (~640 lines)
toolchain/mfc/test/test_coverage_unit.py 51 unit tests
toolchain/mfc/test/test_coverage_cache.json.gz Pre-built cache (555 tests, 12KB)
toolchain/mfc/test/test.py Integration of --only-changes and --build-coverage-cache
toolchain/mfc/cli/commands.py CLI arguments and help text
.github/workflows/test.yml rebuild-cache, commit-cache jobs; --only-changes in test jobs; dep-change detection
.github/workflows/phoenix/rebuild-cache.sh SLURM script for cache rebuild (set -e)
.github/workflows/phoenix/test.sh --only-changes on PRs, 64-thread parallel cap
.github/workflows/frontier/test.sh --only-changes on PRs
.github/workflows/frontier_amd/test.sh --only-changes on PRs
.github/workflows/phoenix/submit.sh GNR exclusive nodes for CPU jobs
CMakeLists.txt gcov build flags (see above)
.github/file-filter.yml cases_py path filter
.gitignore Ignore raw JSON cache

Test plan

  • ./mfc.sh precheck -j 8 passes
  • 51 unit tests pass (./mfc.sh lint) — 77 total including other test suites
  • Coverage cache builds successfully on Phoenix GNR node (555/555 tests with coverage)
  • All 3 targets (pre_process, simulation, post_process) compile with --gcov
  • CI: rebuild-cache skips when cases.py unchanged and no dep changes
  • CI: test jobs proceed normally when rebuild-cache is skipped
  • CI: --only-changes correctly prunes tests (118/555 for single .fpp change)
  • CI: debug builds apply -% 20 sampling after pruning (23/118)
  • CI: master pushes run full suite (no --only-changes)
  • CI: rebuild-cache triggers on cases.py change
  • CI: rebuild-cache triggers on Fortran dependency change (use/include)
  • CI: commit-cache commits updated cache back to PR branch

🤖 Generated with Claude Code

@sbryngelson sbryngelson marked this pull request as ready for review February 27, 2026 17:32
Copilot AI review requested due to automatic review settings February 27, 2026 17:32
@sbryngelson sbryngelson marked this pull request as draft February 27, 2026 17:35
coderabbitai[bot]

This comment was marked as resolved.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR replaces the regex/JSON-based test dependency mapping with a gcov-driven, line-level coverage cache to support ./mfc.sh test --only-changes test pruning, and wires the feature into the CLI and CI workflows.

Changes:

  • Add toolchain/mfc/test/coverage.py to build/load a gzipped per-test line coverage cache and intersect it with git diff -U0 line ranges.
  • Integrate --only-changes, --changes-branch, and --build-coverage-cache into the test runner and CLI.
  • Update CMake Fypp invocation and CI workflows to support .fpp line mapping and PR-only pruning.

Reviewed changes

Copilot reviewed 12 out of 14 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
toolchain/mfc/test/coverage.py Implements coverage cache build/load, gcov JSON parsing, diff parsing, and test filtering.
toolchain/mfc/test/test.py Adds --only-changes filtering flow and --build-coverage-cache entrypoint.
toolchain/mfc/test/test_coverage_unit.py Adds offline unit tests for diff parsing, gcov parsing, and filtering logic.
toolchain/mfc/cli/commands.py Exposes new CLI flags and examples for coverage-based pruning.
CMakeLists.txt Adds Fypp line marker format to improve gcov↔.fpp line mapping.
.github/workflows/test.yml Ensures full git history and enables PR-only --only-changes invocation.
.github/workflows/phoenix/{submit.sh,test.sh} Forwards PR context into SLURM jobs and conditionally enables --only-changes.
.github/workflows/frontier/{submit.sh,test.sh} Same PR-context forwarding and conditional --only-changes.
.github/workflows/frontier_amd/{submit.sh,test.sh} Same PR-context forwarding and conditional --only-changes.
.gitignore Ignores legacy raw .json cache while keeping .json.gz trackable.

@MFlowCode MFlowCode deleted a comment from coderabbitai bot Feb 28, 2026
@MFlowCode MFlowCode deleted a comment from coderabbitai bot Feb 28, 2026
@MFlowCode MFlowCode deleted a comment from Copilot AI Feb 28, 2026
@MFlowCode MFlowCode deleted a comment from Copilot AI Feb 28, 2026
@MFlowCode MFlowCode deleted a comment from Copilot AI Feb 28, 2026
@MFlowCode MFlowCode deleted a comment from Copilot AI Feb 28, 2026
@MFlowCode MFlowCode deleted a comment from Copilot AI Feb 28, 2026
@sbryngelson sbryngelson changed the title Replace regex dependency parser with gcov line-level coverage pruning Add gcov-based test pruning with file-level coverage cache Mar 1, 2026
@MFlowCode MFlowCode deleted a comment from github-actions bot Mar 1, 2026
@MFlowCode MFlowCode deleted a comment from github-actions bot Mar 1, 2026
@MFlowCode MFlowCode deleted a comment from codecov bot Mar 1, 2026
@MFlowCode MFlowCode deleted a comment from github-actions bot Mar 1, 2026
@MFlowCode MFlowCode deleted a comment from github-actions bot Mar 1, 2026
@MFlowCode MFlowCode deleted a comment from github-actions bot Mar 1, 2026
@MFlowCode MFlowCode deleted a comment from github-actions bot Mar 1, 2026
@MFlowCode MFlowCode deleted a comment from coderabbitai bot Mar 1, 2026
@MFlowCode MFlowCode deleted a comment from github-actions bot Mar 1, 2026
@MFlowCode MFlowCode deleted a comment from github-actions bot Mar 1, 2026
@MFlowCode MFlowCode deleted a comment from github-actions bot Mar 1, 2026
@MFlowCode MFlowCode deleted a comment from github-actions bot Mar 1, 2026
@MFlowCode MFlowCode deleted a comment from github-actions bot Mar 1, 2026
@MFlowCode MFlowCode deleted a comment from github-actions bot Mar 1, 2026
@MFlowCode MFlowCode deleted a comment from github-actions bot Mar 1, 2026
sbryngelson and others added 11 commits March 1, 2026 12:52
…ve dead code

- Report non-zero exit codes from post_process (and other targets) during
  cache build instead of silently swallowing them
- Gate --line-marker-format=gfortran5 behind MFC_GCov so it only applies
  to gcov instrumented builds
- Remove unused collect_coverage_from_gcda function

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
GCC 12.3 with LTO on Granite Rapids generates AVX-512 FP16 instructions
(vmovw) that the system assembler (binutils 2.35) cannot encode. Disabling
LTO for gcov builds avoids this and is correct since gcov instrumentation
at -O1 does not benefit from link-time optimization.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
GCC 12.3 with -O3 -march=native on Granite Rapids emits AVX-512 FP16
instructions (vmovw) that binutils 2.35 cannot assemble. For gcov builds,
force -O1 via CMAKE_Fortran_FLAGS_RELEASE and skip -march=native entirely.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Remove _gcda_path_to_fpp (dead code, never called in production)
- Remove 7 associated unit tests (TestGcdaPathToFpp)
- Add set -e to rebuild-cache.sh for error propagation
- Add github.repository guard to commit-cache job

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Fix help text to say file-level (not line-level) throughout commands.py
- Allow tests to proceed when rebuild-cache fails (not just skipped)
- Fix _run_single_test_direct docstring return type (3-tuple not 2-tuple)
- Add cache integrity validation: reject zero-coverage cache builds
- Remove bare except in _parse_gcov_json_output (let unexpected errors propagate)
- Log warnings in _prepare_test instead of silently swallowing exceptions
- Fail early with clear error when no MPI launcher found

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Replace bare except Exception with (SubprocessError, OSError) in
  collect_coverage_for_test to let unexpected errors propagate
- Add fork PR guard to commit-cache job so it skips for external PRs

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add --only-changes flag to all CI test commands (Github, Phoenix,
Frontier, Frontier AMD). When the coverage cache is available, only
tests exercising changed .fpp files will run. Falls back to full
suite when cache is missing or stale.

Also fetch master ref in Github CI for merge-base diff, and add
origin/master fallback in get_changed_files for shallow clones.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@github-actions
Copy link

github-actions bot commented Mar 1, 2026

Claude Code Review

Head SHA: b47f748

Files changed: 14

  • .github/file-filter.yml
  • .github/workflows/frontier/test.sh
  • .github/workflows/frontier_amd/test.sh
  • .github/workflows/phoenix/rebuild-cache.sh (new)
  • .github/workflows/phoenix/submit.sh
  • .github/workflows/phoenix/test.sh
  • .github/workflows/test.yml
  • .gitignore
  • CMakeLists.txt
  • toolchain/mfc/cli/commands.py
  • toolchain/mfc/test/coverage.py (new, 630 lines)
  • toolchain/mfc/test/test.py
  • toolchain/mfc/test/test_coverage_cache.json.gz (new binary)
  • toolchain/mfc/test/test_coverage_unit.py (new, 595 lines)

Summary

  • Adds gcov-based file-level coverage cache (coverage.py) to prune test runs to only those tests that exercise changed .fpp files.
  • New --only-changes and --build-coverage-cache CLI flags integrated into all cluster test scripts.
  • CI adds rebuild-cache + commit-cache jobs to keep the committed cache fresh when cases.py changes.
  • CMake gains a --gcov build mode that overrides -O3-O1, skips -march=native, and disables LTO.
  • 47 unit tests in test_coverage_unit.py cover the core filtering logic offline.

Findings

1. submit.sh: CPU partition change affects ALL CPU jobs, not just gcov rebuild (medium severity)

.github/workflows/phoenix/submit.sh

-#SBATCH -p cpu-small               # partition
-#SBATCH --ntasks-per-node=24       # Number of cores per node required
-#SBATCH --mem-per-cpu=2G           # Memory per core
+#SBATCH -p cpu-gnr                 # partition (full Granite Rapids node)
+#SBATCH --exclusive                # exclusive access to all cores
+#SBATCH -C graniterapids           # constrain to GNR architecture

sbatch_cpu_opts is used for every CPU-type job submitted through submit.sh, including regular CPU test runs — not just the gcov rebuild. Switching to --exclusive and removing --mem-per-cpu=2G means every CPU job now claims an entire GNR node. This is more expensive and the cpu-gnr partition may have lower availability than cpu-small, potentially increasing queue wait times for test jobs. It also silently removes the explicit memory limit, which could matter for memory-sensitive test configurations.

Consider scoping this change to rebuild-cache.sh via its own sbatch invocation rather than changing the shared opts used by all CPU jobs.

2. New .fpp files with no cache coverage silently skip all tests (documented but risky)

toolchain/mfc/test/coverage.py, test_coverage_unit.py:1564

The unit test test_new_fpp_file_no_coverage_skips explicitly documents this behavior: if a developer adds a new .fpp file and modifies it, no test covers it in the cache yet, so all tests are skipped. If the new file is compiled as part of existing modules that are already tested, the PR may silently evade coverage.

In practice this is constrained because new .fpp files must be wired into the build and called from existing code that IS covered — but it's worth flagging in --only-changes help text or documentation so developers know to expect a full suite run the first time after a cache rebuild that includes their new file.

3. commit-cache: git pull --rebase after git commit can fail mid-rebase

``.github/workflows/test.yml, commit-cache` job (lines ~266–270)

git commit -m "Regenerate gcov coverage cache..."
git pull --rebase
git push

If the rebase encounters a conflict (another commit was pushed to the PR branch between the checkout and the push step), the push step will fail and the job will error out. The commit exists locally but the push failed, leaving the Action in a failed state. The next run would re-download the artifact and retry — this is mostly self-healing — but it could be confusing. Consider a retry loop (for i in 1 2 3; do git pull --rebase && git push && break; done) or simply using git push --force-with-lease after the rebase.

4. test.py: --build-coverage-cache ignores --only and runs before __filter

toolchain/mfc/test/test.py, lines ~1121–1137

The if ARG("build_coverage_cache"): block executes before cases, skipped_cases = __filter(cases), so --only and other filters are not applied. This means ./mfc.sh test --build-coverage-cache --only 1D silently ignores --only and builds the full cache. The current help text doesn't mention this. Add a warning if --only is also specified, or explicitly document that --build-coverage-cache always processes all tests.

5. coverage.py: conservative fallback only for src/simulation/ changes, not src/pre_process/ or src/post_process/

toolchain/mfc/test/coverage.py, lines ~1022–1041

changed_sim = any(f.startswith("src/simulation/") for f in changed_fpp)
...
if changed_sim and not any(f.startswith("src/simulation/") for f in test_file_set):
    to_run.append(case)  # conservative: no sim coverage

There is a conservative include for tests that lack simulation coverage when simulation files change (catching cache-build failures for that test). An analogous guard doesn't exist for src/pre_process/ or src/post_process/. If a test's pre_process coverage was not recorded (e.g., it errored in Phase 1), and a pre_process file changes, that test would be incorrectly skipped. This is lower risk because pre_process runs faster and is more likely to have succeeded during cache build, but it's an inconsistency worth noting.


Minor observations (no action required unless desired)

  • rebuild-cache.sh has no error reporting if ./mfc.sh build --gcov or ./mfc.sh test --build-coverage-cache produce a partial result — but set -e ensures the script exits non-zero on failure, which is correct.
  • FYPP_GCOV_OPTS is only set inside the if (MFC_GCov) block but referenced unconditionally in HANDLE_SOURCES. CMake safely expands undefined variables to empty string, so this is fine.
  • The _normalize_cache function for old line-level format conversion (line ~913) is future-proofing against a format that apparently never shipped publicly; safe to keep as a no-op fallback.
  • The test_coverage_unit.py import bootstrap is necessarily complex due to relative imports in the toolchain package. The dual importlib/exec fallback is reasonable given the constraint.

This is a temporary commit to verify --only-changes works end-to-end.
Will be reverted before merge.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@github-actions
Copy link

github-actions bot commented Mar 1, 2026

Claude Code Review

Head SHA: da7aa27
Files changed: 14

Changed files
  • .github/file-filter.yml
  • .github/workflows/frontier/test.sh
  • .github/workflows/frontier_amd/test.sh
  • .github/workflows/phoenix/rebuild-cache.sh (new)
  • .github/workflows/phoenix/submit.sh
  • .github/workflows/phoenix/test.sh
  • .github/workflows/test.yml
  • .gitignore
  • CMakeLists.txt
  • toolchain/mfc/cli/commands.py
  • toolchain/mfc/test/coverage.py (new, ~630 lines)
  • toolchain/mfc/test/test.py
  • toolchain/mfc/test/test_coverage_cache.json.gz (new binary)
  • toolchain/mfc/test/test_coverage_unit.py (new, 47 tests)

Summary:

  • Adds --only-changes / --build-coverage-cache CLI flags using gcov file-level coverage to skip unaffected tests in CI
  • Introduces a rebuild-cache + commit-cache GitHub Actions job pair that auto-commits an updated cache when cases.py changes
  • CMake is extended with --gcov build mode (lower optimisation, no -march=native, no LTO) for coverage builds
  • Existing Phoenix CPU tests upgraded from cpu-small 8-thread to GNR exclusive-node 64-thread
  • 47 unit tests cover the core Python filtering and gcov-parsing logic

Findings

1. BLOCKER — CMakeLists.txt deliberately commented out of ALWAYS_RUN_ALL, but unit test still asserts it triggers the full suite

toolchain/mfc/test/coverage.py line ~472:

# "CMakeLists.txt",  # TEMP: disabled to test pruning in CI — re-enable before merge

toolchain/mfc/test/test_coverage_unit.py lines 1403–1406:

def test_cmakelists_triggers_all(self):
    assert should_run_all_tests({"CMakeLists.txt"}) is True

CMakeLists.txt is not in ALWAYS_RUN_ALL, so should_run_all_tests({"CMakeLists.txt"}) returns False and this unit test fails. The PR description's "47 unit tests pass" claim cannot be current — either the lint was run before this line was commented out, or the comment was added after the final lint run.

This is a self-documented pre-merge blocker. Before merging, uncomment "CMakeLists.txt" in ALWAYS_RUN_ALL.


2. Medium — commit-cache has no retry loop around git pull --rebase && git push

.github/workflows/test.yml lines ~265–270:

git pull --rebase
git push

The comment says "rebase for concurrent updates," but a single --rebase + push will still fail if a concurrent push lands between those two commands. Two PRs touching cases.py at the same time (or a developer pushing to the branch while the bot commits) will hit a non-zero git push exit. Since the step has no continue-on-error: true, the whole commit-cache job will fail — but its result has no downstream consumers, so CI jobs will proceed with the stale cache from the repo. The failure is cosmetically noisy. Consider a small retry loop:

for attempt in 1 2 3; do
  git pull --rebase && git push && break || true
done

3. Medium — Frontier/Frontier-AMD test scripts get --only-changes but have no git fetch master

.github/workflows/frontier/test.sh and frontier_amd/test.sh:
Both scripts now pass --only-changes. The GitHub matrix job has a corresponding step:

- name: Fetch master for coverage diff
  run: git fetch origin master:master --depth=1
  continue-on-error: true

Frontier and Frontier-AMD SLURM jobs clone the repo via checkout@v4 with clean: false but have no equivalent git fetch. In a shallow clone where origin/master is not reachable, get_changed_files returns None → falls back to the full suite silently. This is safe (conservative fallback works), but the optimisation never fires on Frontier. If this is intentional for now, a comment would clarify.


4. Low — _prepare_test enqueues SYSCHECK binary; Phase 1 will log phantom failures for it

toolchain/mfc/test/coverage.py lines ~768–778:

targets = [SYSCHECK, PRE_PROCESS, SIMULATION, POST_PROCESS]

_run_single_test_direct then executes every binary in the list (line ~724–733). SYSCHECK validates Fypp/CMake availability; it does not consume the .inp file and will exit non-zero in the build/ environment when invoked directly via mpirun. This will populate all_failures for essentially every test, producing a noisy warning wall during cache builds. Removing SYSCHECK from the targets list (or handling it specially) would eliminate false-positive failure noise.


5. Low — _parse_gcov_json_output silently ignores parse errors on a per-.gcno granularity; no aggregate warning

toolchain/mfc/test/coverage.py lines ~583–588:

except (gzip.BadGzipFile, OSError):
    try:
        data = json.loads(raw_bytes)
    except (json.JSONDecodeError, ValueError):
        return set()

A silent empty-set return means a corrupted or mismatched-version gcov output goes undetected until the final "tests with coverage" sanity check. The sanity check threshold (< len(cases) // 2) is generous enough that moderate corruption will still pass. This is acceptable for now, but worth noting for future debugging.


Improvement Opportunities

  • The changes_branch argument defaults to "master" in commands.py and get_changed_files. For repos that use main as the default branch, this silently falls back to full suite. Documenting the assumption or detecting main vs master automatically would help portability.
  • _compute_gcov_prefix_strip counts path components of MFC_ROOT_DIR; on symlinked CI workspaces os.path.realpath may give a different depth than the working directory. The existing os.path.realpath call handles this correctly, but a debug log of the computed strip value (already done at line ~818) aids troubleshooting.

Overall: The feature is well-architected with solid conservative fallbacks and good unit test coverage. The primary issue is the commented-out CMakeLists.txt in ALWAYS_RUN_ALL — that must be restored before merge.

git fetch --depth=1 alone doesn't provide enough history for
merge-base to find a common ancestor. Add --deepen=200 to expand
the shallow clone so --only-changes can compute the correct diff.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@github-actions
Copy link

github-actions bot commented Mar 1, 2026

Claude Code Review

Head SHA: a398834
Files changed: 14

File +/-
.github/file-filter.yml +3
.github/workflows/frontier/test.sh +2/-2
.github/workflows/frontier_amd/test.sh +2/-2
.github/workflows/phoenix/rebuild-cache.sh +17
.github/workflows/phoenix/submit.sh +3/-3
.github/workflows/phoenix/test.sh +4/-2
.github/workflows/test.yml +122/-6
.gitignore +3
CMakeLists.txt +26/-11
toolchain/mfc/cli/commands.py +25
toolchain/mfc/test/coverage.py +630
toolchain/mfc/test/test.py +66/-1
toolchain/mfc/test/test_coverage_cache.json.gz binary
toolchain/mfc/test/test_coverage_unit.py +595

Summary

  • Adds --only-changes flag to ./mfc.sh test using gcov-based file-level coverage maps to prune unaffected tests on PRs — a potentially large CI time saving.
  • Coverage cache is rebuilt in a SLURM job on Phoenix when cases.py changes, then auto-committed back to the PR branch.
  • The architecture is conservative: missing cache → full suite, git failure → full suite, newly added tests → always included, tests with no simulation coverage when simulation files changed → included.
  • 47 unit tests cover the core logic. CMake gcov flag plumbing is clean.
  • Overall the design is solid and the safety guards are well-thought-out.

Findings

1. CMakeLists.txt intentionally excluded from ALWAYS_RUN_ALL — must be reverted before merge

toolchain/mfc/test/coverage.py, line 474:

# "CMakeLists.txt",  # TEMP: disabled to test pruning in CI — re-enable before merge

This is explicitly marked as a temporary workaround. CMake changes (new compile flags, LTO toggles, etc.) can affect runtime behaviour without touching any .fpp file. This entry must be uncommented before this PR merges, otherwise a CMake change could silently skip tests that would catch regressions. The unit test at line 1342–1346 also tests for CMakeLists.txt being in ALWAYS_RUN_ALL, so leaving it commented will cause that test to pass vacuously.


2. git pull --rebase + git push in commit-cache is not retry-safe

.github/workflows/test.yml, ~line 270:

git commit -m "Regenerate gcov coverage cache..."
git pull --rebase
git push
echo "pushed=true" >> "$GITHUB_OUTPUT"

The PR description says "rebase for concurrent updates," but there is no retry loop. If two rebuild-cache runs complete simultaneously (e.g., after a force-push rebasing the PR branch), git push will fail with a non-zero exit. Because the run: step defaults to failing the job on any non-zero exit, the pushed=true line is never reached and the PR comment is skipped — but more importantly the job fails, requiring a manual re-run. A simple retry loop (e.g. for i in 1 2 3; do git pull --rebase && git push && break; done) would avoid spurious failures.


3. --only-changes applied to Frontier/Frontier-AMD test scripts unconditionally (including push events on master)

.github/workflows/frontier/test.sh, line 21+24:

./mfc.sh test -v -a $rdma_opts --max-attempts 3 --only-changes ...
./mfc.sh test -v -a --max-attempts 3 --only-changes -j 32 ...

These shell scripts are invoked for both PR and push-to-master events. On master, there is no PR-base commit to diff against, so get_changed_files() will attempt git merge-base master HEAD where HEAD is master, returning an empty diff. With no .fpp changes detected, filter_tests_by_coverage returns ([], list(cases)) and all tests are skipped on master push events. The conservative fallback (no cache → full suite) does not trigger here because the cache is present — only the diff is empty. If the intent is to always run the full suite on pushes to master, --only-changes should be gated behind a check for GITHUB_EVENT_NAME == pull_request in these scripts or removed from non-PR CI paths.


4. Phase 2 coverage collection is sequential despite being independent

toolchain/mfc/test/coverage.py, lines 865–880:

for i, (uuid, test_gcda) in enumerate(sorted(test_results.items())):
    zero_gcda_files(root_dir)
    n_copied = _install_gcda_files(test_gcda, root_dir)
    ...
    coverage = collect_coverage_for_test(matching or gcno_files, root_dir, gcov_bin)
    cache[uuid] = sorted(coverage)

Phase 2 is serialised because it installs .gcda files into the shared build/ directory before running gcov. This works correctly (the GCOV_PREFIX isolation is only in Phase 1), but it is substantially slower than Phase 1's parallel execution and dominates total cache-build time for large test suites. A minor optimisation note rather than a bug — the ~8x speedup from _find_matching_gcno partially compensates.


5. Minor: _normalize_cache mutates dict while iterating

toolchain/mfc/test/coverage.py, lines 921–925:

def _normalize_cache(cache: dict) -> dict:
    for key, value in cache.items():
        if key == "_meta":
            continue
        if isinstance(value, dict):
            cache[key] = sorted(value.keys())
    return cache

Mutating a dict during dict.items() iteration is safe in CPython 3.7+ for value changes (not insertions/deletions), but is technically undefined behaviour in the language spec. Consider iterating over list(cache.items()) or collecting changes and applying them after. Low severity, but worth fixing for correctness across Python implementations.


Overall

The feature is well-designed and the safety guards (conservative fallbacks, ALWAYS_RUN_ALL, fork protection, set -e in rebuild scripts) are thoughtfully implemented. Issue 1 (commented-out CMakeLists.txt) and Issue 3 (skipping all tests on master pushes) are the two blockers worth addressing before merge. Issues 2, 4, and 5 are lower priority.

sbryngelson and others added 2 commits March 1, 2026 14:15
--only-changes must not run on master pushes: merge-base would diff
master against itself, find no changes, and skip all tests. Gate
behind GITHUB_EVENT_NAME == pull_request in all test scripts.

Add retry loop (3 attempts) for git push in commit-cache to handle
concurrent updates from simultaneous rebuild-cache completions.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Removes one word from a comment to verify --only-changes runs ~38
bubble tests and skips the rest. Will be reverted before merge.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@github-actions
Copy link

github-actions bot commented Mar 1, 2026

Claude Code Review

Head SHA: 6fcb3b4c3c9053210b7a96b1975b2b26d61a4b22

Files changed: 15

  • .github/file-filter.yml
  • .github/workflows/frontier/test.sh, frontier_amd/test.sh
  • .github/workflows/phoenix/rebuild-cache.sh, submit.sh, test.sh
  • .github/workflows/test.yml
  • CMakeLists.txt
  • src/simulation/m_bubbles.fpp
  • toolchain/mfc/cli/commands.py
  • toolchain/mfc/test/coverage.py, test.py, test_coverage_cache.json.gz, test_coverage_unit.py

Summary

  • Adds gcov-based file-level coverage cache to prune tests on PRs, skipping tests whose covered files don't overlap with changed files.
  • Three-phase cache build: prep (single-threaded), run binaries in parallel with per-test GCOV_PREFIX isolation, collect gcov JSON.
  • CI auto-rebuilds and commits the cache when cases.py changes; cache staleness is detected by SHA256 hash of cases.py.
  • Safety fallbacks: stale/missing cache → full suite; git failure → full suite; test not in cache → conservatively included.
  • 47 unit tests in test_coverage_unit.py cover the core logic.

Findings

[CRITICAL] CMakeLists.txt commented out of ALWAYS_RUN_ALL — TODO not resolved before merge

File: toolchain/mfc/test/coverage.py, line 522

# "CMakeLists.txt",  # TEMP: disabled to test pruning in CI — re-enable before merge

This has two concrete consequences:

  1. Failing unit test. test_coverage_unit.py line 1453 asserts:

    def test_cmakelists_triggers_all(self):
        assert should_run_all_tests({"CMakeLists.txt"}) is True

    But should_run_all_tests is a pure set-intersection against ALWAYS_RUN_ALL, which no longer includes CMakeLists.txt. The assertion fails. The PR description says "47 unit tests pass" — this test contradicts that.

  2. Safety regression. A PR that changes only CMakeLists.txt (e.g., enabling LTO, changing optimization flags, adding a compile definition) will have the full test suite silently skipped rather than run. The code path in test.py:
    should_run_all_testsFalse, then changed_fpp → empty (no .fpp), so cases = []. CMake changes affect all builds but leave no .fpp trace.

    Fix: Uncomment the CMakeLists.txt line before merging.


[MODERATE] Push-retry loop silently succeeds on total failure

File: .github/workflows/test.yml, commit-cache job, lines 302–307

for i in 1 2 3; do
  git pull --rebase && git push && break
  echo "Push attempt $i failed, retrying in 5s..."
  sleep 5
done
echo "pushed=true" >> "$GITHUB_OUTPUT"

If all three push attempts fail, the loop exits without error and pushed=true is set unconditionally. The subsequent PR comment step fires, falsely claiming the cache was updated. Add || { echo 'All push attempts failed'; exit 1; } after the loop, and set pushed=true only inside the break path.


[MINOR] Unrelated change in m_bubbles.fpp

File: src/simulation/m_bubbles.fpp, line 410 — a docstring comment is updated. This is a one-line cosmetic change unrelated to the coverage feature. Consider moving it to a separate commit for a cleaner history.


[MINOR] filter_tests_by_coverage skips all tests when non-.fpp critical files change

File: toolchain/mfc/test/coverage.py, line 1068

changed_fpp = {f for f in changed_files if f.endswith(".fpp")}
if not changed_fpp:
    return [], list(cases)

Files in ALWAYS_RUN_ALL that are not .fpp (e.g., toolchain/mfc/run/input.py, toolchain/mfc/case_validator.py) already correctly trigger the full suite via should_run_all_tests before filter_tests_by_coverage is called, so the call ordering in test.py handles this correctly. However, this relies on the should_run_all_tests guard being complete — the CMakeLists.txt gap above is exactly where that assumption breaks.


Improvement Opportunities (if no blockers above)

  • _compute_gcov_prefix_strip: The -1 excludes root '/'" comment implies GCOV_PREFIX_STRIP counts directory names excluding the leading /`. This is correct on Linux. Consider adding a note that this is Linux-specific and will need adjustment on Windows paths (though MFC is HPC-focused).
  • rebuild-cache.sh: The script uses set -e which is good, but ./mfc.sh build --gcov -j 8 failures may produce confusing output. A pre-check that gfortran is on PATH and supports --coverage would give a clearer error message.

Re-enable CMakeLists.txt in ALWAYS_RUN_ALL and revert the trivial
comment change in m_bubbles.fpp. Both were temporary for CI testing.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@github-actions
Copy link

github-actions bot commented Mar 1, 2026

Claude Code Review

Head SHA: 240ef0a
Files changed: 14

Files
  • .github/file-filter.yml
  • .github/workflows/frontier/test.sh
  • .github/workflows/frontier_amd/test.sh
  • .github/workflows/phoenix/rebuild-cache.sh
  • .github/workflows/phoenix/submit.sh
  • .github/workflows/phoenix/test.sh
  • .github/workflows/test.yml
  • .gitignore
  • CMakeLists.txt
  • toolchain/mfc/cli/commands.py
  • toolchain/mfc/test/coverage.py
  • toolchain/mfc/test/test.py
  • toolchain/mfc/test/test_coverage_cache.json.gz
  • toolchain/mfc/test/test_coverage_unit.py

Summary

  • Adds a gcov-based file-level coverage cache system that maps each test UUID → .fpp source files it exercises, enabling --only-changes to skip tests unaffected by a PR's diff.
  • Three-phase cache build: prepare .inp files single-threaded, run 555 tests in parallel with isolated GCOV_PREFIX directories, then collect gcov JSON sequentially.
  • Conservative fallback logic throughout: stale/missing cache → full suite; git failure → full suite; test UUID not in cache → include it.
  • CI adds rebuild-cache + commit-cache jobs that auto-commit the updated cache when cases.py changes, gated behind non-fork/non-draft checks.
  • CMake --gcov flag overrides -O3 → -O1, disables -march=native and LTO for gfortran coverage builds.

Findings

1. coverage.py is missing from ALWAYS_RUN_ALL — bootstrapping hazard

File: toolchain/mfc/test/coverage.py, line 498–510

If coverage.py itself is modified (e.g., to fix a pruning bug), running --only-changes on the very PR that changed the pruning logic will use the old binary for filtering. The fix or regression in pruning won't trigger a full suite run. Adding coverage.py to ALWAYS_RUN_ALL closes this loop:

ALWAYS_RUN_ALL = frozenset([
    ...
    "toolchain/mfc/test/coverage.py",   # <-- add this
    ...
])

2. load_coverage_cache — unguarded read_bytes() raises on missing cases.py

File: toolchain/mfc/test/coverage.py, line 981–982

cases_py = Path(root_dir) / "toolchain/mfc/test/cases.py"
current_hash = hashlib.sha256(cases_py.read_bytes()).hexdigest()

If cases.py is somehow absent (e.g., after a partial checkout or worktree), this raises an unhandled FileNotFoundError rather than returning None gracefully. Wrap in a try/except consistent with the gzip.open handling just above.

3. commit-cache retry loop silently swallows push failures

File: .github/workflows/test.yml, lines 302–308

for i in 1 2 3; do
  git pull --rebase && git push && break
  echo "Push attempt $i failed, retrying in 5s..."
  sleep 5
done
echo "pushed=true" >> "$GITHUB_OUTPUT"

echo "pushed=true" is outside the loop — it always executes even if all 3 push attempts fail. This means pushed=true is emitted on total failure and the PR comment claims the cache was updated when it wasn't. Move the echo inside the break arm, and add an exit 1 (or echo "pushed=false") after the loop:

for i in 1 2 3; do
  git pull --rebase && git push && { echo "pushed=true" >> "$GITHUB_OUTPUT"; break; }
  echo "Push attempt $i failed, retrying in 5s..."
  sleep 5
done

4. MPI launcher selection prefers mpirun over srun on SLURM systems

File: toolchain/mfc/test/coverage.py, lines 748–757

if shutil.which("mpirun"):
    mpi_cmd = ["mpirun", "--bind-to", "none", "-np", str(ppn)]
elif shutil.which("srun"):
    mpi_cmd = ["srun", "--ntasks", str(ppn)]

On Phoenix (SLURM) mpirun is likely available alongside srun, so mpirun wins. Inside a SLURM allocation, srun respects the job's resource reservation and process binding; bare mpirun --bind-to none does not. Consider checking for $SLURM_JOB_ID to prefer srun in SLURM environments, or simply reversing the preference order.

5. Missing binaries silently yield empty coverage, no warning

File: toolchain/mfc/test/coverage.py, line 763

if not os.path.isfile(bin_path):
    continue

If a target binary is absent (e.g., only simulation was built because --gcov failed for one target), the test runs with fewer phases and records no coverage data. The cache then stores [] for that UUID. No warning is emitted. The conservative heuristic at line 1077 partially mitigates this for simulation-file changes but not for pre_process/post_process-only changes. At a minimum, a cons.print(f"[yellow]Warning: binary {bin_path} not found for {uuid}[/yellow]") here would surface cache quality issues early.

6. _normalize_cache mutates its argument in-place

File: toolchain/mfc/test/coverage.py, lines 956–961

def _normalize_cache(cache: dict) -> dict:
    for key, value in cache.items():
        if key == "_meta":
            continue
        if isinstance(value, dict):
            cache[key] = sorted(value.keys())   # mutates the dict being iterated
    return cache

Mutating a dict while iterating it is safe in CPython 3.7+ when only values (not keys) are changed, but it is still surprising API. The function signature implies a transform, not in-place mutation. Low-risk in practice since the caller immediately discards the original, but worth a # note: mutates in place comment or a defensive copy.


Minor / Nits

  • test_coverage_unit.py imports coverage.py via a dual-path strategy (importlib first, then exec-based fallback). The fallback rewrites import statements as string replacements — if any import line changes format (e.g., whitespace), the fallback breaks silently. An AssertionError or explicit error on fallback failure would make test breakage obvious.
  • submit.sh diff: the removed line -#SBATCH --mem-per-cpu=2G # Memory per core\ also had a trailing \, confirming the new #SBATCH -C graniterapids # constrain to GNR architecture\ trailing backslash is intentional (shell string continuation). No action needed.

…nd missing binary warning

- case.fpp and toolchain/cmake/ are invisible to gcov (Fypp-inlined
  or non-Fortran) but affect all tests. coverage.py itself must
  trigger a full run to validate pruning logic changes.
- Fix commit-cache retry: pushed=true was emitted even when all 3
  push attempts failed.
- Warn when a target binary is missing during cache build so cache
  quality issues are visible.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@github-actions
Copy link

github-actions bot commented Mar 1, 2026

Claude Code Review

Head SHA: 9d90188
Files changed: 14

  • .github/file-filter.yml
  • .github/workflows/frontier/test.sh, frontier_amd/test.sh
  • .github/workflows/phoenix/rebuild-cache.sh, submit.sh, test.sh
  • .github/workflows/test.yml
  • .gitignore
  • CMakeLists.txt
  • toolchain/mfc/cli/commands.py
  • toolchain/mfc/test/coverage.py (new, 643 lines)
  • toolchain/mfc/test/test.py
  • toolchain/mfc/test/test_coverage_cache.json.gz (binary)
  • toolchain/mfc/test/test_coverage_unit.py (new, 615 lines)

Summary

  • Adds --only-changes flag for gcov-based test pruning on PRs using a committed file-level coverage cache
  • Cache build runs all 555 tests with GCOV_PREFIX isolation via ThreadPoolExecutor, then maps each test UUID → covered .fpp files
  • Conservative fallbacks throughout: stale/missing cache, git failure, unknown UUIDs, and infrastructure file changes all fall back to full suite
  • CMake changes correctly disable -O3, -march=native, and LTO for gcov builds
  • CI integration uses artifact pass-through between rebuild-cache and test jobs, with bot commit back to the PR branch

Findings

1. test.py--build-coverage-cache sees filtered cases if --only-changes is also set

toolchain/mfc/test/test.py (around the if ARG("build_coverage_cache"): block)

The --only-changes filter runs first and narrows cases. The subsequent --build-coverage-cache block derives all_cases from that already-filtered cases variable:

all_cases = [b.to_case() for b in cases]  # cases may already be filtered

If a user (or script) passes both flags together, the resulting cache will cover only the filtered subset of tests, not all 555. The rebuild-cache.sh script is correct (never passes --only-changes), so this won't affect CI, but a defensive guard would prevent a confusing silent failure for anyone invoking manually:

if ARG("build_coverage_cache") and ARG("only_changes"):
    cons.print("[red]Error: --build-coverage-cache and --only-changes are mutually exclusive.[/red]")
    sys.exit(1)

2. commit-cache rebase retry loop — no abort on rebase failure

.github/workflows/test.yml (commit-cache job, Commit Updated Cache step)

for i in 1 2 3; do
  git pull --rebase && git push && { pushed=true; break; }
  echo "Push attempt $i failed, retrying in 5s..."
  sleep 5
done

If git pull --rebase itself fails mid-rebase (e.g., a conflict in the cache file), subsequent retry iterations hit git pull --rebase again with an in-progress rebase, which will fail with "fatal: Cannot 'rebase': You have unstaged changes." This leaves git in a bad state for the rest of the loop. Adding git rebase --abort 2>/dev/null || true before each retry would make the loop resilient:

for i in 1 2 3; do
  git pull --rebase && git push && { pushed=true; break; }
  git rebase --abort 2>/dev/null || true
  echo "Push attempt $i failed, retrying in 5s..."
  sleep 5
done

3. ALWAYS_RUN_ALL doesn't cover all of src/common/include/

toolchain/mfc/test/coverage.py:499-513

The set explicitly lists the known macro files (parallel_macros.fpp, acc_macros.fpp, etc.) and macros.fpp. If a future PR adds a new include file under src/common/include/ (e.g., a new helper macro file), it won't automatically trigger the full suite — it would be silently pruned as though it were a normal source file. A prefix-based guard would be more future-proof:

ALWAYS_RUN_ALL_PREFIXES = (
    "toolchain/cmake/",
    "src/common/include/",  # all Fypp include/macro files
)

and the explicit per-file entries in ALWAYS_RUN_ALL for include files could be removed.

4. test_coverage_unit.py import fallback is brittle

toolchain/mfc/test/test_coverage_unit.py:1329-1337

The exec() fallback rewrites imports via literal string replacement:

_src = (
    _src
    .replace("from ..printer import cons", "cons = _globals['cons']")
    ...
)
exec(compile(_src, _COVERAGE_PATH, "exec"), _globals)

If any of these import lines gain a trailing comment, extra whitespace, or a different alias (e.g., from ..printer import cons as _cons), the replacement silently fails and the fallback produces a module with broken references. This is already well-mitigated by the primary importlib path, but the fragility is worth noting if coverage.py's imports change.


Minor / Improvements (no blocking issues)

  • coverage.py:938: The "< half coverage" warning threshold len(cases) // 2 may fire spuriously on a small test selection. Consider using a lower bound (e.g., 20%) or making it configurable.
  • coverage.py:803: case.params.get("bubbles_lagrange", 'F') == 'T' uses string 'T'/'F' which matches MFC convention, but a helper constant or is_true() utility (if one exists) would be clearer.
  • submit.sh diff, line 98: The added #SBATCH -C graniterapids comment has a trailing \ — likely a copy-paste artifact; harmless since it's inside a comment.

Overall this is a well-designed feature with good safety guards. The three findings above are worth addressing before merge, particularly finding #2 (commit-cache rebase loop) which could cause the bot commit to silently fail in a concurrent-push scenario.

Add Fortran dependency graph change detection (use/include grep) to
trigger cache rebuilds. Temporarily disable CMakeLists.txt from
ALWAYS_RUN_ALL and add a benign duplicate use statement to verify
the dep-change pipeline works end-to-end.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@github-actions
Copy link

github-actions bot commented Mar 1, 2026

Claude Code Review

Head SHA: 0b03265a707b143b6ecbc88f6fc4338693ead8f1

Files changed: 15

  • .github/file-filter.yml
  • .github/workflows/frontier/test.sh
  • .github/workflows/frontier_amd/test.sh
  • .github/workflows/phoenix/rebuild-cache.sh
  • .github/workflows/phoenix/submit.sh
  • .github/workflows/phoenix/test.sh
  • .github/workflows/test.yml
  • .gitignore
  • CMakeLists.txt
  • src/simulation/m_bubbles.fpp
  • toolchain/mfc/cli/commands.py
  • toolchain/mfc/test/coverage.py
  • toolchain/mfc/test/test.py
  • toolchain/mfc/test/test_coverage_cache.json.gz
  • toolchain/mfc/test/test_coverage_unit.py

Summary:

  • Adds --only-changes flag that uses a gcov-based file-level coverage cache to skip tests unaffected by a PR's changed source files — significant CI speedup (verified: 118/555 tests for a single .fpp change).
  • Coverage cache is built once on Phoenix GNR nodes, stored as a gzip JSON artifact, and auto-committed back to the PR branch when cases.py or Fortran dependency statements change.
  • Conservative fallbacks: unknown UUID → include; no simulation coverage → include when simulation files change; stale/missing cache → full suite; git failure → full suite.
  • ALWAYS_RUN_ALL protects GPU macro files, toolchain definitions, and CMake infra from silent pruning.
  • 51 unit tests with thorough coverage of the filtering logic.

Findings

Critical — Must fix before merge

1. Debug artifact left in production Fortran source

src/simulation/m_bubbles.fpp, added lines:

+    use m_helper_basic         !< TEMP: test dep-change cache rebuild trigger

This is a duplicate use m_helper_basic statement (the same module is already imported two lines above). It was added only to trigger the dep-change detection mechanism during testing. It must be removed before merge — it compiles but the TEMP label is an explicit marker it should not land, and strict compilers may warn on duplicate USE.

2. CMakeLists.txt excluded from ALWAYS_RUN_ALL (TEMP comment not cleaned up)

toolchain/mfc/test/coverage.py, ALWAYS_RUN_ALL set:

    # "CMakeLists.txt",  # TEMP: disabled to test dep-change cache rebuild trigger

This was deliberately commented out for testing. With CMakeLists.txt absent from ALWAYS_RUN_ALL, a PR that changes build flags, adds compiler options, or modifies LTO/march settings will not trigger the full test suite. This is a real safety regression. This line must be uncommented before merge.


Logic gap — Dep-change detection only covers additions, not removals

.github/workflows/test.yml, dep-check step:

grep -qP '^\+\s*(use[\s,]+\w|#:include\s|include\s+['"'"'"])

The regex only matches lines beginning with + (added dependency statements). Removing a use or #:include also changes the Fortran dependency graph — the removed module is no longer exercised, making the old coverage mapping stale. Consider also matching ^[-]\s*... for removed lines, or matching ^[+-]\s*... to cover both directions.


Minor — Commit message hardcoded to "cases.py changed"

.github/workflows/test.yml, commit-cache job:

git commit -m "Regenerate gcov coverage cache

Automatically rebuilt because cases.py changed."

The rebuild-cache job also triggers when dep_changed == 'true' (Fortran dependency change), but the commit message always says "cases.py changed." This is harmless functionally but may be confusing in git history. A minor fix: "cases.py or Fortran dependency graph changed."


Observation — Three test plan items still unchecked

The PR description lists three items as not yet validated in CI:

  • rebuild-cache triggered by cases.py change
  • rebuild-cache triggered by Fortran dep change
  • commit-cache successfully commits updated cache

These are the primary correctness guarantees of the cache invalidation mechanism. These should be verified or explained before merging.


Overall: The design is solid, the conservative fallbacks are well-considered, and the unit test suite is thorough. The two critical issues (TEMP use statement in Fortran source and CMakeLists.txt missing from ALWAYS_RUN_ALL) are straightforward one-line fixes. The dep-change grep gap is worth addressing to avoid a subtle correctness hole in cache invalidation.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@github-actions
Copy link

github-actions bot commented Mar 1, 2026

Claude Code Review

Head SHA: 7e18115761eab86a80aca38b9fd3ceecaad136d9
Files changed: 15

Files
  • .github/file-filter.yml
  • .github/workflows/frontier/test.sh
  • .github/workflows/frontier_amd/test.sh
  • .github/workflows/phoenix/rebuild-cache.sh
  • .github/workflows/phoenix/submit.sh
  • .github/workflows/phoenix/test.sh
  • .github/workflows/test.yml
  • .gitignore
  • CMakeLists.txt
  • src/simulation/m_bubbles.fpp
  • toolchain/mfc/cli/commands.py
  • toolchain/mfc/test/coverage.py
  • toolchain/mfc/test/test.py
  • toolchain/mfc/test/test_coverage_cache.json.gz
  • toolchain/mfc/test/test_coverage_unit.py

Summary

  • Adds a gcov-based file-level coverage cache so CI can prune tests to only those that exercise changed .fpp files, with a measured 118/555 reduction for a single-file change.
  • Architecture is sound: conservative fallbacks everywhere (stale cache → full suite, git failure → full suite, unknown UUID → include), and GPU macro files / build system files correctly bypass pruning via ALWAYS_RUN_ALL.
  • Three-phase cache build (prepare → run in parallel with GCOV_PREFIX isolation → sequential gcov collection) is well-designed and avoids .gcda file collisions.
  • 51 unit tests cover all the core logic functions thoroughly.
  • Two "TEMP" debugging artifacts were left in the PR and must be removed before merge. One causes a unit test to fail; the other leaves debug code in a Fortran source file.

Findings

🔴 Critical — Must fix before merge

1. Duplicate use statement in src/simulation/m_bubbles.fpp

use m_helper_basic         !< Functions to compare floating point numbers

use m_helper_basic         !< TEMP: test dep-change cache rebuild trigger

This is an explicit debugging artifact used to trigger the dep-change cache rebuild CI path. It must be removed before merge — it adds a redundant import to production Fortran code, and the comment makes clear it was never intended to stay.


2. CMakeLists.txt commented out of ALWAYS_RUN_ALL in toolchain/mfc/test/coverage.py (line 546)

ALWAYS_RUN_ALL = frozenset([
    ...
    "toolchain/mfc/test/coverage.py",
    # "CMakeLists.txt",  # TEMP: disabled to test dep-change cache rebuild trigger
])

This was commented out temporarily to test the dep-change trigger. With it absent, changing CMakeLists.txt (which controls build flags, LTO, gcov instrumentation, Fypp options, etc.) will not trigger the full test suite — a genuine correctness gap.

This also causes a direct unit test failure: test_coverage_unit.py::TestShouldRunAllTests::test_cmakelists_triggers_all asserts should_run_all_tests({"CMakeLists.txt"}) is True, which will return False with the entry commented out. The PR checklist states "51 unit tests pass" but this test cannot pass with the current code.

Fix: Uncomment the "CMakeLists.txt" entry and remove both TEMP lines.


🟡 Moderate

3. FYPP_GCOV_OPTS used before initialization in CMakeLists.txt

${FYPP_GCOV_OPTS}

This variable is only set inside if (CMAKE_Fortran_COMPILER_ID STREQUAL "GNU"). For non-GNU compilers it is never defined, so ${FYPP_GCOV_OPTS} expands silently to an empty string. This is correct behavior but will produce a warning with --warn-uninitialized. Consider adding set(FYPP_GCOV_OPTS "") before the GNU block.


4. commit-cache PR comment hardcodes "cases.py changed" even when triggered by a Fortran dep change (test.yml line 338)

--body "Coverage cache auto-updated: a bot commit was pushed to this branch because `cases.py` changed."

The cache is also rebuilt when Fortran use/include statements change (the dep_changed output from the dep-check step). The message is misleading in that case.


🟢 Minor / Observations

5. Unquoted $prune_flag in shell scripts

In frontier/test.sh, frontier_amd/test.sh, and phoenix/test.sh, $prune_flag is used unquoted. In bash, word-splitting of an empty variable produces no argument, which is the intended behavior here. This is safe but deviates from the "${var}" quoting convention used elsewhere in these scripts.

6. New .fpp files added during a PR are silently excluded

In filter_tests_by_coverage, a brand-new .fpp file that no existing test covers yet will cause all tests to be skipped for that file (no cached test covers it). The test_new_fpp_file_no_coverage_skips unit test documents this as accepted behavior, but it means adding a new source module without existing test coverage won't trigger any regression test. A warning message at runtime would help surface this case.


Verdict

The core design is solid and the fallback behavior is appropriately conservative. The two TEMP artifacts are blocking issues — remove the duplicate use from m_bubbles.fpp and uncomment CMakeLists.txt in ALWAYS_RUN_ALL. With those two fixes, the PR is ready to merge.

The self-hosted Phoenix runner retains build artifacts across jobs.
A prior --gpu mp build leaves CMake flags (e.g. -foffload=amdgcn-amdhsa)
that cause gfortran to fail when building with --gcov.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@github-actions
Copy link

github-actions bot commented Mar 1, 2026

Claude Code Review

Head SHA: d5b0b36
Files changed: 15

File Purpose
toolchain/mfc/test/coverage.py Core gcov cache logic (~640 lines)
toolchain/mfc/test/test_coverage_unit.py 51 unit tests
toolchain/mfc/test/test_coverage_cache.json.gz Pre-built cache binary
toolchain/mfc/test/test.py Integration of --only-changes / --build-coverage-cache
toolchain/mfc/cli/commands.py CLI args
.github/workflows/test.yml rebuild-cache, commit-cache jobs; dep-change detection
.github/workflows/phoenix/rebuild-cache.sh SLURM rebuild script
.github/workflows/phoenix/test.sh PR pruning on Phoenix
.github/workflows/frontier/test.sh PR pruning on Frontier
.github/workflows/frontier_amd/test.sh PR pruning on Frontier AMD

Summary

  • Adds gcov-based file-level coverage cache to prune test runs on PRs to only tests that exercise changed .fpp files (confirmed 118/555 tests for a single .fpp change, 0 tests when only non-Fortran files change).
  • Three-phase build: Phase 0 prepares input files single-threaded, Phase 1 runs all tests in parallel with GCOV_PREFIX isolation, Phase 2 collects gcov coverage sequentially per test.
  • Conservative fallbacks throughout: stale/missing cache → full suite, git failure → full suite, test not in cache → include it, test with no simulation coverage when sim files change → include it.
  • CI integration with rebuild-cache (Phoenix SLURM) and commit-cache (auto-pushes to PR branch) when cases.py changes or Fortran dependency graph changes (use/include lines appear in diff).
  • The commit-cache job is gated to non-fork PRs and has push/pull_request write permissions scoped appropriately.

Findings

🔴 Must Fix — Debug Artifact Left in Source

src/simulation/m_bubbles.fpp:435

+    use m_helper_basic         !< TEMP: test dep-change cache rebuild trigger

This use statement is explicitly labeled TEMP and exists solely to trigger the dep-change detection CI job during development. It must be removed before merge. It is an unnecessary Fortran dependency that will affect compilation, tests, and the coverage cache.


🔴 Must Fix — Unit Test Will Fail (CMakeLists.txt Gap)

toolchain/mfc/test/coverage.py:550 and toolchain/mfc/test/test_coverage_unit.py:1492–1495

CMakeLists.txt is commented out of ALWAYS_RUN_ALL:

# "CMakeLists.txt",  # TEMP: disabled to test dep-change cache rebuild trigger

But the unit test still asserts it triggers a full suite:

def test_cmakelists_triggers_all(self):
    assert should_run_all_tests(
        {"CMakeLists.txt"}
    ) is True   # <-- will FAIL: CMakeLists.txt not in ALWAYS_RUN_ALL

CMakeLists.txt is also not matched by ALWAYS_RUN_ALL_PREFIXES = ("toolchain/cmake/",). As a result:

  1. The unit test will fail as written.
  2. Real-world CMakeLists.txt changes will not trigger a full suite (the dep-change use/include grep won't catch CMakeLists edits either).

Resolution: un-comment "CMakeLists.txt" in ALWAYS_RUN_ALL or, if intentionally excluded, update/remove the unit test and document the gap.


🟡 Minor — _normalize_cache Mutates Its Input

toolchain/mfc/test/coverage.py:1004–1009

def _normalize_cache(cache: dict) -> dict:
    for key, value in cache.items():
        if key == "_meta":
            continue
        if isinstance(value, dict):
            cache[key] = sorted(value.keys())  # mutates in place
    return cache

Mutating the caller's dict is a subtle side effect. Prefer result = dict(cache) at the start and modify result.


🟡 Minor — Phase 2 Coverage Collection Is Sequential

toolchain/mfc/test/coverage.py:948–963

Phase 1 runs all 555 tests in parallel. Phase 2 (gcov) iterates tests sequentially. Each iteration zeros build-tree .gcda files, installs one test's .gcda files, runs gcov over ~50 .gcno files, then cleans up. This serial design is a correctness requirement (shared build tree) but could be slow. The PR comments say this is acceptable; worth noting it as a follow-up optimization target if Phase 2 wall-time becomes a bottleneck (e.g., using per-test temp trees in parallel).


🟡 Minor — Unquoted Shell Variable

.github/workflows/frontier/test.sh:32, frontier_amd/test.sh:57, phoenix/test.sh:132

./mfc.sh test -v -a ...  -j  ...

`` is unquoted. Since it is either empty or the single word --only-changes (no spaces), this is safe in practice — but quoting it (`"$prune_flag"`) would be cleaner style. (An empty unquoted variable expands to nothing in bash, which is the desired behavior here.)


Overall

The architecture is sound: conservative fallbacks everywhere, good unit test coverage of the pruning logic, and clear CI gating (PRs only, full suite on master). The two TEMP artifacts (use m_helper_basic in m_bubbles.fpp and the commented-out CMakeLists.txt) are the blocking issues — they appear to have been left in intentionally for testing the dep-change trigger during development but must be cleaned up before merge.

🤖 Generated with Claude Code

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

2 participants