Add gcov-based test pruning with coverage cache by sbryngelson · Pull Request #1283 · MFlowCode/MFC

sbryngelson · 2026-03-02T00:14:17Z

Summary

Adds coverage-based test selection that skips tests unaffected by a PR's changes, reducing CI test time by ~80% for typical PRs.

Coverage cache: Pre-built map of 555 test UUIDs → exercised .fpp files (11KB gzip JSON)
--only-changes flag: Intersects changed files against the cache to run only affected tests
Cache rebuild: Automatic on master push when cases.py or Fortran dependency graph changes; also via workflow_dispatch
Safety guards: Changes to GPU macros, CMakeLists.txt, toolchain files, or Fypp includes always trigger the full suite (ALWAYS_RUN_ALL)
Parallel cache build: Phase 1 runs 555 tests with isolated GCOV_PREFIX dirs (32 workers); Phase 2 collects coverage via batched gcov calls with ThreadPoolExecutor
53 unit tests for the coverage module

CI integration

Trigger	`rebuild-cache` runs?	Cache destination
PR changing `cases.py`	Yes	Uploaded as artifact for test jobs
PR changing Fortran `use`/`include`	Yes	Uploaded as artifact for test jobs
Push to master changing `cases.py`	Yes	Committed to repo (`[skip ci]`)
`workflow_dispatch`	Yes	Committed to repo (`[skip ci]`)
Other PRs	No	Uses committed cache from master

Files changed

toolchain/mfc/test/coverage.py — Core coverage logic (~480 lines)
toolchain/mfc/test/test_coverage_unit.py — 53 unit tests
toolchain/mfc/test/test.py — --only-changes and --build-coverage-cache integration
.github/workflows/test.yml — rebuild-cache job, artifact upload/download, dep-change detection
.github/workflows/phoenix/rebuild-cache.sh — SLURM script for cache rebuild
Other CI scripts — --only-changes flag added to test invocations

Test plan

CI lint gate passes
Cache rebuild triggers on cases.py changes (verified in prior PR)
Dep-change detection triggers on use/include additions (verified in prior PR)
Github matrix tests pass with coverage cache artifact
Unit tests pass (53/53)

Replaces #1273.

🤖 Generated with Claude Code

github-actions · 2026-03-02T00:18:54Z

Claude Code Review

Head SHA: 1ed87c9
Files changed: 14

Changed files:

.github/file-filter.yml
.github/workflows/frontier/test.sh
.github/workflows/frontier_amd/test.sh
.github/workflows/phoenix/rebuild-cache.sh
.github/workflows/phoenix/submit.sh
.github/workflows/phoenix/test.sh
.github/workflows/test.yml
.gitignore
CMakeLists.txt
toolchain/mfc/cli/commands.py
toolchain/mfc/test/coverage.py
toolchain/mfc/test/test.py
toolchain/mfc/test/test_coverage_cache.json.gz
toolchain/mfc/test/test_coverage_unit.py

Summary

Adds file-level gcov coverage cache mapping 555 test UUIDs → exercised .fpp files (gzip JSON, committed to repo)
Adds --only-changes flag that intersects PR-changed files against the cache to prune unaffected tests
Adds --build-coverage-cache flag + 3-phase parallel cache builder (prepare → run → gcov collect)
New rebuild-cache CI job runs on Phoenix via SLURM; triggers on cases.py changes or Fortran dep-graph changes in PRs
53 unit tests cover the key coverage logic functions

Findings

1. Master-push dep-change detection gap — test.yml

The dep-check step only runs on pull_request events:

if: github.event_name == 'pull_request'
id: dep-check

The rebuild-cache condition for push events only gates on cases_py:

(github.event_name == 'push' &&
 needs.file-changes.outputs.cases_py == 'true')

If a PR that adds a use m_foo or #:include statement is merged without touching cases.py, the PR run correctly rebuilds the cache (via the artifact path), but the updated cache is not committed to master (that only happens on push with cases_py == 'true'). Subsequent PRs will use the stale master cache until a workflow_dispatch or a cases.py-touching push. This is a silent correctness gap, not a crash — the fallback is running the full suite — but worth documenting or addressing.

2. _normalize_cache mutates dict during iteration — coverage.py:83

for key, value in cache.items():
    if key == "_meta":
        continue
    if isinstance(value, dict):
        cache[key] = sorted(value.keys())   # mutates value while iterating keys
return cache

Mutating values (not keys) during Python 3 dict iteration is defined behavior and safe, but changing cache[key] in-place while iterating cache.items() is a code smell. Consider building a new dict: return {k: (sorted(v.keys()) if isinstance(v, dict) and k != "_meta" else v) for k, v in cache.items()}.

3. Conservative "no-sim-coverage" include may be broader than intended — coverage.py:~312

if changed_sim and not any(f.startswith("src/simulation/") for f in test_file_set):
    to_run.append(case)
    continue

The rationale ("cache build likely failed for this test") is sound — every MFC test runs the full pre/sim/post pipeline, so a test with zero simulation coverage indicates a failed gcov run. However, the unit test test_incomplete_coverage_included_conservatively confirms this behavior is intentional. Worth noting: if a significant fraction of tests fail during cache build (e.g., first-time cache on a flaky cluster), this heuristic could substantially reduce pruning effectiveness. A counter in the output showing how many tests were included via this path would help operators detect a degraded cache.

4. src/simulation/include/ not in ALWAYS_RUN_ALL_PREFIXES — coverage.py:~54

The macro files in src/common/include/ are all explicitly listed in ALWAYS_RUN_ALL. If src/simulation/include/ contains .fpp files that are #:include-d by simulation modules (e.g., simulation-specific inline helpers), changes to those files would be handled by the coverage intersection (which is correct if Fypp line markers map them correctly) — but if gcov attributes their lines to the including file rather than the included file, the cache would silently miss them. Given --line-marker-format=gfortran5 is set in CMake, Fypp should emit correct line markers for included files, so this is likely fine. Just worth verifying empirically with one of the simulation include files.

5. Dep-change grep pattern false-positive risk — test.yml

grep -qP '^\+\s*(use[\s,]+\w|#:include\s|include\s+['"'"'"])'

The pattern use[\s,]+\w would also match comment lines like + ! use this approach or variable names like + use_flag = .true. if they appear at the start of a diff line and the regex engine interprets use_flag as starting with use. In practice this is very unlikely (diff lines with +use_flag at column 1 would be unusual Fortran style), but a tighter pattern like use\s+\w (requiring whitespace after use) would be safer.

Minor / improvement opportunities

datetime import is unused in coverage.py — imported but no datetime calls appear in the diff.
gcno_files variable in build_coverage_cache is used only for diagnostics (count printing). The early-failure path from find_gcno_files raises a clear error if no .gcno files exist, which is the real value. The variable itself could be replaced with just calling find_gcno_files for the side-effect and using len(...) inline.
Phase 1 progress reporting counts futures completion order (non-deterministic with as_completed). The i counter reflects completion order, not submission order — fine for progress, but the "(i+1)/total" display might be confusing if users expect it to track wall-clock ordering.

Correctness verdict

The core coverage logic (filter_tests_by_coverage, _parse_gcov_json_output, get_changed_files, should_run_all_tests) is correct and well-tested. The GCOV_PREFIX_STRIP calculation (len(parts) - 1 to exclude the root /) correctly produces a strip count that maps .gcda files to test_gcda/build/.... The CMake changes (set(CMAKE_Fortran_FLAGS_RELEASE "-O1 -DNDEBUG") for gcov, disabling LTO and -march=native for gcov builds) are appropriate. The conservative behavior on cache miss / no-sim-coverage / git failure all correctly fall back to running more tests, not fewer.

The master-push dep-change gap (#1 above) is the only finding that could cause a committed stale cache to persist silently across merges. All other findings are low-severity.

github-actions · 2026-03-02T00:28:52Z

Claude Code Review

Head SHA: e696936
Files changed: 15

Changed files

.github/file-filter.yml
.github/workflows/frontier/test.sh
.github/workflows/frontier_amd/test.sh
.github/workflows/phoenix/rebuild-cache.sh
.github/workflows/phoenix/submit.sh
.github/workflows/phoenix/test.sh
.github/workflows/test.yml
.gitignore
CMakeLists.txt
src/simulation/m_bubbles.fpp
toolchain/mfc/cli/commands.py
toolchain/mfc/test/coverage.py
toolchain/mfc/test/test.py
toolchain/mfc/test/test_coverage_cache.json.gz
toolchain/mfc/test/test_coverage_unit.py

Summary

Adds a gcov-based test pruning system: pre-built coverage cache maps 555 test UUIDs → exercised .fpp files; --only-changes uses this to skip unaffected tests (~80% CI speedup claimed).
Cache rebuild is automated on cases.py changes or Fortran dependency-graph changes (new use/include lines); full suite is forced for GPU macro files, CMake, and key toolchain files.
Safety fallback: cache missing, stale, or git failure → full test suite runs.
53 unit tests exercise the core filter/parse logic offline.
CMakeLists.txt gcov changes disable -march=native and LTO; Fypp gains --line-marker-format=gfortran5 for accurate source mapping.

Findings

Bug — Duplicate `use m_helper_basic` in `src/simulation/m_bubbles.fpp`

src/simulation/m_bubbles.fpp line 16–17 (diff)

    use m_helper_basic         !< Functions to compare floating point numbers

    use m_helper_basic         ← newly added duplicate

The PR adds a second use m_helper_basic statement immediately after the existing one. Most compilers accept duplicate use statements but it is unnecessary, confusing, and likely a rebase artifact. Remove the duplicate.

Design gap — Root `CMakeLists.txt` not in `ALWAYS_RUN_ALL`

toolchain/mfc/test/coverage.py lines ~506–524

ALWAYS_RUN_ALL_PREFIXES covers toolchain/cmake/ but the root CMakeLists.txt is not in ALWAYS_RUN_ALL and does not match the prefix. The unit test test_cmakelists_does_not_trigger_all explicitly encodes this as expected behavior.

A PR that changes only CMakeLists.txt (e.g., new compiler flag, new link option, new source glob) will have its tests pruned via coverage. Build-system changes can alter codegen in ways that CPU coverage cannot detect. Recommend adding CMakeLists.txt to ALWAYS_RUN_ALL.

Design gap — New `.fpp` files added to the repo skip all coverage-filtered tests

toolchain/mfc/test/coverage.py filter_tests_by_coverage (~line 1066)

When a PR adds a brand-new .fpp source file, no entry in the cache covers it yet (the cache predates the file). All existing tests have test_file_set & changed_fpp == ∅, so the conservative-for-UUID-not-in-cache path is never reached and every test is skipped. The unit test test_new_fpp_file_no_coverage_skips documents this as intentional.

This is safe only when the dep-change detector fires (a new use m_new_module line in another file triggers a cache rebuild that includes the new file). If the new file is wired in solely through CMakeLists.txt (no new use statement in existing files), the dep-change detector will not trigger and the coverage cache will silently skip all tests for a file that has never been covered. Worth a comment in the code or a guard.

Minor — Dep-change detection is addition-only

.github/workflows/test.yml line ~165

grep -qP '^\+\s*(use[\s,]+\w|#:include\s|include\s+['""])

github-actions · 2026-03-02T00:41:18Z

Claude Code Review

Head SHA: a77b967d0bfe59ac13331e5577c7b54b7636f2ff
Files changed: 15

Changed files

.github/file-filter.yml
.github/workflows/frontier/test.sh
.github/workflows/frontier_amd/test.sh
.github/workflows/phoenix/rebuild-cache.sh
.github/workflows/phoenix/submit.sh
.github/workflows/phoenix/test.sh
.github/workflows/test.yml
.gitignore
CMakeLists.txt
src/simulation/m_bubbles.fpp
toolchain/mfc/cli/commands.py
toolchain/mfc/test/coverage.py
toolchain/mfc/test/test.py
toolchain/mfc/test/test_coverage_cache.json.gz
toolchain/mfc/test/test_coverage_unit.py

Summary

Adds gcov-based file-level coverage cache (555 tests → covered .fpp file map) to prune test runs on PRs to only affected tests
Adds --only-changes and --build-coverage-cache CLI flags to the test command
Adds a rebuild-cache GitHub Actions job that rebuilds on cases.py changes or Fortran dependency graph changes
Conservative safety guards: GPU macro files, Fypp includes, and build system files always trigger the full suite
53 unit tests for the coverage module with broad edge case coverage
Minor CMakeLists.txt improvements: skip -march=native and LTO for gcov builds

Findings

1. Security: fork PR can run untrusted code on the self-hosted Phoenix runner

File: .github/workflows/test.yml, around line ~200 (rebuild-cache job)

The rebuild-cache job checks out github.event.pull_request.head.sha and runs it on the gt (Phoenix) self-hosted runner with contents: write permission. For PRs from forks, this executes arbitrary contributor code on the organization's HPC cluster. This is a significant security concern and is the canonical GitHub Actions supply-chain risk for pull_request + self-hosted runners.

If forked PRs can trigger this path (i.e., the workflow runs in a pull_request context and the self-hosted runner accepts fork PRs), attackers could exfiltrate secrets or abuse cluster resources.

Mitigation options:

Add github.event.pull_request.head.repo.full_name == github.repository to the job if: condition to restrict cache rebuilds to non-fork PRs only.
Or use pull_request_target with explicit head-SHA pinning (but this has its own risks).
Or restrict rebuild-cache to push and workflow_dispatch events only, with PRs always falling back to the committed cache.

2. Correctness: `--only-changes` logic diverges from `filter_tests_by_coverage` on `.f90` files

File: toolchain/mfc/test/test.py, coverage pruning block

In test.py, the early-exit path for "no .fpp changes" uses:

changed_fpp = {f for f in changed_files if f.endswith(".fpp")}
if not changed_fpp:
    ...skips all tests...

But filter_tests_by_coverage (in coverage.py) correctly includes .f90 files in its changed_fpp set. If a user modifies only src/common/m_precision_select.f90 (a real source file, not generated), test.py will incorrectly skip all tests while the underlying filter function would have run the right subset. The unit test test_f90_changes_trigger_matching_tests validates this works in filter_tests_by_coverage, but the early-exit guard in test.py will preempt it.

Fix: Use the same predicate in both places:

changed_fpp = {f for f in changed_files if f.endswith(".fpp") or f.endswith(".f90")}

3. Correctness: `CMakeLists.txt` not in `ALWAYS_RUN_ALL` but test claims it should not trigger

File: toolchain/mfc/test/test_coverage_unit.py, test_cmakelists_does_not_trigger_all

The unit test at line ~1487 explicitly asserts CMakeLists.txt does not trigger the full suite:

def test_cmakelists_does_not_trigger_all(self):
    assert should_run_all_tests({"CMakeLists.txt"}) is False

But CMakeLists.txt controls compilation flags (e.g., optimization levels, preprocessor defines like MFC_OpenACC). Changes to it can affect which code paths are compiled in and thus what tests are relevant — arguably it should trigger the full suite, or at minimum be placed in ALWAYS_RUN_ALL_PREFIXES. The comment in coverage.py notes that toolchain/cmake/ triggers all (prefix rule), but the top-level CMakeLists.txt is excluded. This may be intentional if CMake changes are assumed to be covered by CI build checks, but it is worth a deliberate decision comment.

4. Minor: `_compute_gcov_prefix_strip` off-by-one risk

File: toolchain/mfc/test/coverage.py, _compute_gcov_prefix_strip

return str(len(Path(real_root).parts) - 1)  # -1 excludes root '/'

The comment says "-1 excludes root /". On Linux, Path('/home/user/mfc').parts is ('/', 'home', 'user', 'mfc') (4 parts), so stripping 3 leaves home/user/mfc. The intent appears to be to strip the repo root prefix from compiled-in .gcda paths so the GCOV_PREFIX tree starts at build/. This is highly sensitive to the exact path structure and may break if MFC is installed at a path depth different from where the cache was built. Worth a brief integration test or a more robust approach (e.g., computing strip count from a known .gcno file's embedded path).

5. Minor: `rebuild-cache` job skips on lint failure but matrix tests proceed

File: .github/workflows/test.yml

The matrix test if: condition uses needs.rebuild-cache.result != 'cancelled', which allows the matrix to proceed even if rebuild-cache failed (result == 'failure'). This is correct by design (tests fall back to the committed cache), but if the rebuild failed mid-way and uploaded a partial artifact, the continue-on-error: true on the download step would silently use a corrupt cache. A stale-but-complete committed cache is preferred over a partial freshly-built one, and the current logic handles this correctly — the download only happens if: needs.rebuild-cache.result == 'success'. No bug here, but worth noting the flow is correct.

Improvement Opportunities

Explicit ALWAYS_RUN_ALL for .github/workflows/: Changes to CI workflow files themselves (.yml) are not in ALWAYS_RUN_ALL. A CI workflow change that adds a new test matrix entry or changes build flags would not trigger the full suite. Consider adding ".github/" as an ALWAYS_RUN_ALL_PREFIXES entry.
rebuild-cache.sh hardcodes gfortran CPU build: The cache is built with gfortran only. GPU-specific Fypp macro expansions differ by compiler and backend. The comment in ALWAYS_RUN_ALL (GPU macro files always trigger full suite) correctly compensates, but this dependency is implicit. A comment in rebuild-cache.sh explaining why GPU builds are excluded from cache generation would aid future maintainers.

🤖 Generated with Claude Code

github-actions · 2026-03-02T00:46:24Z

Claude Code Review

Head SHA: 499a3471f9d92d2ab6bda5d132c11468e179ffbb
Files changed: 15

Changed files

.github/file-filter.yml
.github/workflows/frontier/test.sh
.github/workflows/frontier_amd/test.sh
.github/workflows/phoenix/rebuild-cache.sh
.github/workflows/phoenix/submit.sh
.github/workflows/phoenix/test.sh
.github/workflows/test.yml
.gitignore
CMakeLists.txt
src/simulation/m_bubbles.fpp
toolchain/mfc/cli/commands.py
toolchain/mfc/test/coverage.py
toolchain/mfc/test/test.py
toolchain/mfc/test/test_coverage_cache.json.gz
toolchain/mfc/test/test_coverage_unit.py

Summary

Adds a gcov-based test pruning system: build MFC once with coverage instrumentation, record per-test file coverage, then skip unaffected tests on PRs.
Introduces --only-changes and --build-coverage-cache CLI flags for ./mfc.sh test.
Adds a rebuild-cache CI job on Phoenix triggered when cases.py or Fortran use/include dependencies change.
Safety guards (ALWAYS_RUN_ALL, ALWAYS_RUN_ALL_PREFIXES) force full suite for GPU macros, key toolchain files, and CMake helpers.
53 unit tests for coverage logic with good edge-case coverage.

Findings

1. Bug — Duplicate in

File: src/simulation/m_bubbles.fpp, added at line +7 in the diff (two consecutive use m_helper_basic statements):

use m_helper_basic         !< Functions to compare floating point numbers
use m_helper_basic          ← duplicate added by this PR

This is a stray duplicate use statement. Most compilers accept it (with or without a warning) but it's incorrect and should be removed. gfortran emits a warning for this; nvfortran and Cray ftn may silently accept it, so it won't block CI, but it should be cleaned up.

2. Security — job executes fork PR code on a self-hosted runner

File: .github/workflows/test.yml, rebuild-cache job (~lines 179–233 in diff)

The job:

Checks out the PR's head SHA (github.event.pull_request.head.sha) from a potentially external fork.
Submits a SLURM job that builds and runs that code on the Phoenix HPC cluster.
Holds permissions: contents: write to commit the resulting cache back to master.

For pull_request events from forks, GitHub withholds repository secrets, but the runner itself is still invoked and executes the checked-out code. A malicious PR could craft build scripts or Fortran source that executes arbitrary commands during the SLURM job. The contents: write permission then gives the resulting workflow job authority to push to master.

Mitigations to consider:

Gate the job on github.event.pull_request.head.repo.full_name == 'MFlowCode/MFC' (i.e., only for internal branches, not forks) — or restrict to pull_request_target with explicit fork allowlisting.
Alternatively, require a maintainer label before the cache-rebuild job runs.

If Phoenix runners are already restricted to repo members only (runner group ACLs), this risk is reduced but the workflow-level check is still good practice.

3. Logic gap — Root changes skip all tests

File: toolchain/mfc/test/coverage.py, ALWAYS_RUN_ALL and ALWAYS_RUN_ALL_PREFIXES

The unit test at test_coverage_unit.py explicitly confirms this:

def test_cmakelists_does_not_trigger_all(self):
    assert should_run_all_tests({"CMakeLists.txt"}) is False

Files under toolchain/cmake/ correctly trigger the full suite, but CMakeLists.txt at the repo root does not. A PR that only modifies CMakeLists.txt (e.g., changes a compile flag, adds an add_compile_options, or modifies linker flags) will result in zero tests running with --only-changes. Since this PR itself modifies CMakeLists.txt to add gcov options and disable -march=native, this scenario is realistic.

Consider adding "CMakeLists.txt" to ALWAYS_RUN_ALL, or adding "CMakeLists.txt" and similar build system roots to ALWAYS_RUN_ALL_PREFIXES (e.g., ("", "toolchain/cmake/") with a root-level glob).

4. Minor — used before conditional initialization

File: CMakeLists.txt, HANDLE_SOURCES macro usage (~line 400 in diff)

FYPP_GCOV_OPTS is only set inside the if (MFC_GCov) block within the GNU compiler branch. It is then passed unconditionally as ${FYPP_GCOV_OPTS} to every Fypp invocation. CMake silently expands unset variables as empty strings, so this works in practice, but an explicit:

set(FYPP_GCOV_OPTS )

before the compiler-detection block would make the intent clearer and prevent any surprise if the variable scope changes.

Improvement opportunities

The filter_tests_by_coverage conservative fallback for tests with no simulation coverage (lines 1112–1116 in coverage.py) is sound, but the heuristic ("no sim coverage recorded → include conservatively") would also catch tests that legitimately have no simulation phase (e.g., pre_process-only tests). Adding a comment clarifying the intent would help maintainability.
The phase-1 concurrency cap is hardcoded to 32 (min(n_jobs, 32)). A comment explaining why 32 specifically (MPI processes × ~500 MB each = OOM threshold on a given node) is already present — good. Worth also noting the assumed node memory in the comment for future reference.

- File-level gcov coverage cache maps 555 test UUIDs to exercised .fpp source files (gzip JSON, 11KB, committed to repo) - --only-changes flag prunes tests by intersecting PR-changed files against coverage cache; --build-coverage-cache builds the cache - New rebuild-cache CI job runs on Phoenix via SLURM when cases.py or Fortran dependency graph changes (on both PRs and master pushes) - Dep-change detection greps PR/push diffs for added use/include statements that would invalidate the coverage cache - Conservative fallbacks: missing cache runs all, missing sim coverage includes test, ALWAYS_RUN_ALL files trigger full suite - Remove continue-on-error from github CI job (fixes auto-cancellation) - TEMP: duplicate use in m_bubbles.fpp + remove CMakeLists.txt from ALWAYS_RUN_ALL to test the full cache rebuild pipeline in CI - 53 unit tests cover core coverage logic Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

github-actions · 2026-03-02T00:55:11Z

Claude Code Review

Head SHA: 8d58b93
Files changed: 15

Changed files

.github/file-filter.yml
.github/workflows/frontier/test.sh
.github/workflows/frontier_amd/test.sh
.github/workflows/phoenix/rebuild-cache.sh (new)
.github/workflows/phoenix/submit.sh
.github/workflows/phoenix/test.sh
.github/workflows/test.yml
.gitignore
CMakeLists.txt
src/simulation/m_bubbles.fpp
toolchain/mfc/cli/commands.py
toolchain/mfc/test/coverage.py (new, 659 lines)
toolchain/mfc/test/test.py
toolchain/mfc/test/test_coverage_cache.json.gz (binary)
toolchain/mfc/test/test_coverage_unit.py (new, 655 lines)

Summary

Adds gcov-based file-level coverage cache (555 test UUIDs → exercised .fpp files) to enable selective test pruning on PRs (~80% CI time reduction claimed)
New --only-changes flag intersects changed files against the cache; --build-coverage-cache populates it
rebuild-cache CI job on Phoenix triggers when cases.py or Fortran dependency graph changes; commits result to master on push
Safety fallbacks: ALWAYS_RUN_ALL set for GPU macro/toolchain files; stale-cache detection via cases.py SHA256 hash; falls back to full suite on cache miss
53 unit tests added for the coverage module

Findings

1. Bug: Duplicate `use m_helper_basic` in `m_bubbles.fpp` [Medium]

File: src/simulation/m_bubbles.fpp, lines 17–19 (in diff)

    use m_helper_basic         !< Functions to compare floating point numbers
+   use m_helper_basic         ← duplicate added by this PR

This appears to be an artifact of coverage-instrumented test runs (possibly a careless edit). While most compilers tolerate duplicate use statements (Fortran standard permits it), this adds noise, may trigger warnings with strict compilers, and could affect Fypp line-number accounting. Remove the duplicate.

2. `CMakeLists.txt` not in `ALWAYS_RUN_ALL` [Medium]

File: toolchain/mfc/test/coverage.py, lines 520–533

The ALWAYS_RUN_ALL set covers GPU macros and toolchain Python files but not CMakeLists.txt. The unit test test_cmakelists_does_not_trigger_all explicitly asserts this.

This PR itself changes CMakeLists.txt (adds -O1 override, disables -march=native for gcov, disables LTO). If a future PR changes compile flags, optimization levels, or linking options in CMakeLists.txt, the coverage cache won't trigger a full suite run — yet those changes could affect runtime behavior independently of source file coverage.

Suggest adding CMakeLists.txt to ALWAYS_RUN_ALL (or at least to ALWAYS_RUN_ALL_PREFIXES as a directory entry like "toolchain/cmake/"), and updating the corresponding unit test.

3. Dep-change detection misses `use` statement removals [Low]

File: .github/workflows/test.yml, lines 171–176

The grep pattern '^\+\s*(use[\s,]+\w|#:include\s|include\s+...)' only detects added lines (prefixed with +). Removing a use or #:include also changes the Fortran dependency graph and could make the coverage cache stale (a previously-exercised file is no longer compiled into a target, or a new code path is exposed). This is a heuristic limitation, but worth documenting in a comment.

4. `--only-changes` is applied after `--only` in `__filter` [Low]

File: toolchain/mfc/test/test.py, lines 108 → 156

If a user runs ./mfc.sh test --only Bubbles --only-changes, the --only filter narrows the list first, then --only-changes narrows it further. This is silently composable. The interaction should be mentioned in the help text or validated (e.g., warn that --only-changes overrides --only for skip decisions).

5. `rebuild-cache` job gate uses `!= 'cancelled'` but not `!= 'failure'` [Low]

File: .github/workflows/test.yml, lines 243–244

needs.rebuild-cache.result != 'cancelled'

When rebuild-cache is skipped (its if: condition is false), result is 'skipped', which satisfies != 'cancelled', so downstream jobs run — this is the intended fast-path. ✓

When rebuild-cache fails (SLURM error, build failure), result is 'failure', which also satisfies != 'cancelled', so downstream jobs still run with the old/downloaded cache (which the continue-on-error: true on the download step and stale-hash check handles). This appears intentional but is worth a comment explaining that failure is treated as a graceful fallback to the committed cache.

Improvement opportunities

The GCOV_PREFIX_STRIP calculation at coverage.py:687 uses len(Path(real_root).parts) - 1. On Linux /home/user/mfc has 4 parts so strip=3, which is correct. On paths with symlinks this may diverge from the os.path.realpath resolution used above — already handled by calling realpath first. No issue, just note for future maintainers.
The build_coverage_cache function rebuilds all unique binary slugs before running tests, but each per-test _run_single_test_direct call also checks if the binary exists and warns if not. Consider surfacing a clear error (not just warning) if zero binaries are found for a test, since silent skips here would silently corrupt the cache.

sbryngelson mentioned this pull request Mar 2, 2026

Add gcov-based test pruning with file-level coverage cache #1273

Closed

12 tasks

sbryngelson force-pushed the gcov-test-pruning-v2 branch from 499a347 to 8d58b93 Compare March 2, 2026 00:52

sbryngelson mentioned this pull request Mar 2, 2026

Add gcov-based test pruning with coverage cache #1284

Open

4 tasks

sbryngelson closed this Mar 2, 2026

sbryngelson deleted the gcov-test-pruning-v2 branch March 2, 2026 00:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add gcov-based test pruning with coverage cache#1283

Add gcov-based test pruning with coverage cache#1283
sbryngelson wants to merge 1 commit intoMFlowCode:masterfrom
sbryngelson:gcov-test-pruning-v2

sbryngelson commented Mar 2, 2026

Uh oh!

github-actions bot commented Mar 2, 2026

Uh oh!

github-actions bot commented Mar 2, 2026

Uh oh!

github-actions bot commented Mar 2, 2026

Uh oh!

github-actions bot commented Mar 2, 2026

Uh oh!

github-actions bot commented Mar 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

1 participant

Conversation

sbryngelson commented Mar 2, 2026

Summary

CI integration

Files changed

Test plan

Uh oh!

github-actions bot commented Mar 2, 2026

Claude Code Review

Summary

Findings

Minor / improvement opportunities

Correctness verdict

Uh oh!

github-actions bot commented Mar 2, 2026

Claude Code Review

Summary

Findings

Bug — Duplicate use m_helper_basic in src/simulation/m_bubbles.fpp

Design gap — Root CMakeLists.txt not in ALWAYS_RUN_ALL

Design gap — New .fpp files added to the repo skip all coverage-filtered tests

Minor — Dep-change detection is addition-only

Uh oh!

github-actions bot commented Mar 2, 2026

Claude Code Review

Summary

Findings

1. Security: fork PR can run untrusted code on the self-hosted Phoenix runner

2. Correctness: --only-changes logic diverges from filter_tests_by_coverage on .f90 files

3. Correctness: CMakeLists.txt not in ALWAYS_RUN_ALL but test claims it should not trigger

4. Minor: _compute_gcov_prefix_strip off-by-one risk

5. Minor: rebuild-cache job skips on lint failure but matrix tests proceed

Improvement Opportunities

Uh oh!

github-actions bot commented Mar 2, 2026

Claude Code Review

Summary

Findings

1. Bug — Duplicate in

2. Security — job executes fork PR code on a self-hosted runner

3. Logic gap — Root changes skip all tests

4. Minor — used before conditional initialization

Improvement opportunities

Uh oh!

github-actions bot commented Mar 2, 2026

Claude Code Review

Summary

Findings

1. Bug: Duplicate use m_helper_basic in m_bubbles.fpp [Medium]

2. CMakeLists.txt not in ALWAYS_RUN_ALL [Medium]

3. Dep-change detection misses use statement removals [Low]

4. --only-changes is applied after --only in __filter [Low]

5. rebuild-cache job gate uses != 'cancelled' but not != 'failure' [Low]

Improvement opportunities

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

1 participant

Bug — Duplicate `use m_helper_basic` in `src/simulation/m_bubbles.fpp`

Design gap — Root `CMakeLists.txt` not in `ALWAYS_RUN_ALL`

Design gap — New `.fpp` files added to the repo skip all coverage-filtered tests

2. Correctness: `--only-changes` logic diverges from `filter_tests_by_coverage` on `.f90` files

3. Correctness: `CMakeLists.txt` not in `ALWAYS_RUN_ALL` but test claims it should not trigger

4. Minor: `_compute_gcov_prefix_strip` off-by-one risk

5. Minor: `rebuild-cache` job skips on lint failure but matrix tests proceed

1. Bug: Duplicate `use m_helper_basic` in `m_bubbles.fpp` [Medium]

2. `CMakeLists.txt` not in `ALWAYS_RUN_ALL` [Medium]

3. Dep-change detection misses `use` statement removals [Low]

4. `--only-changes` is applied after `--only` in `__filter` [Low]

5. `rebuild-cache` job gate uses `!= 'cancelled'` but not `!= 'failure'` [Low]