Skip to content

Add gcov-based test pruning with coverage cache#1283

Closed
sbryngelson wants to merge 1 commit intoMFlowCode:masterfrom
sbryngelson:gcov-test-pruning-v2
Closed

Add gcov-based test pruning with coverage cache#1283
sbryngelson wants to merge 1 commit intoMFlowCode:masterfrom
sbryngelson:gcov-test-pruning-v2

Conversation

@sbryngelson
Copy link
Member

Summary

Adds coverage-based test selection that skips tests unaffected by a PR's changes, reducing CI test time by ~80% for typical PRs.

  • Coverage cache: Pre-built map of 555 test UUIDs → exercised .fpp files (11KB gzip JSON)
  • --only-changes flag: Intersects changed files against the cache to run only affected tests
  • Cache rebuild: Automatic on master push when cases.py or Fortran dependency graph changes; also via workflow_dispatch
  • Safety guards: Changes to GPU macros, CMakeLists.txt, toolchain files, or Fypp includes always trigger the full suite (ALWAYS_RUN_ALL)
  • Parallel cache build: Phase 1 runs 555 tests with isolated GCOV_PREFIX dirs (32 workers); Phase 2 collects coverage via batched gcov calls with ThreadPoolExecutor
  • 53 unit tests for the coverage module

CI integration

Trigger rebuild-cache runs? Cache destination
PR changing cases.py Yes Uploaded as artifact for test jobs
PR changing Fortran use/include Yes Uploaded as artifact for test jobs
Push to master changing cases.py Yes Committed to repo ([skip ci])
workflow_dispatch Yes Committed to repo ([skip ci])
Other PRs No Uses committed cache from master

Files changed

  • toolchain/mfc/test/coverage.py — Core coverage logic (~480 lines)
  • toolchain/mfc/test/test_coverage_unit.py — 53 unit tests
  • toolchain/mfc/test/test.py--only-changes and --build-coverage-cache integration
  • .github/workflows/test.ymlrebuild-cache job, artifact upload/download, dep-change detection
  • .github/workflows/phoenix/rebuild-cache.sh — SLURM script for cache rebuild
  • Other CI scripts — --only-changes flag added to test invocations

Test plan

  • CI lint gate passes
  • Cache rebuild triggers on cases.py changes (verified in prior PR)
  • Dep-change detection triggers on use/include additions (verified in prior PR)
  • Github matrix tests pass with coverage cache artifact
  • Unit tests pass (53/53)

Replaces #1273.

🤖 Generated with Claude Code

@github-actions
Copy link

github-actions bot commented Mar 2, 2026

Claude Code Review

Head SHA: 1ed87c9
Files changed: 14

Changed files:

  • .github/file-filter.yml
  • .github/workflows/frontier/test.sh
  • .github/workflows/frontier_amd/test.sh
  • .github/workflows/phoenix/rebuild-cache.sh
  • .github/workflows/phoenix/submit.sh
  • .github/workflows/phoenix/test.sh
  • .github/workflows/test.yml
  • .gitignore
  • CMakeLists.txt
  • toolchain/mfc/cli/commands.py
  • toolchain/mfc/test/coverage.py
  • toolchain/mfc/test/test.py
  • toolchain/mfc/test/test_coverage_cache.json.gz
  • toolchain/mfc/test/test_coverage_unit.py

Summary

  • Adds file-level gcov coverage cache mapping 555 test UUIDs → exercised .fpp files (gzip JSON, committed to repo)
  • Adds --only-changes flag that intersects PR-changed files against the cache to prune unaffected tests
  • Adds --build-coverage-cache flag + 3-phase parallel cache builder (prepare → run → gcov collect)
  • New rebuild-cache CI job runs on Phoenix via SLURM; triggers on cases.py changes or Fortran dep-graph changes in PRs
  • 53 unit tests cover the key coverage logic functions

Findings

1. Master-push dep-change detection gap — test.yml

The dep-check step only runs on pull_request events:

if: github.event_name == 'pull_request'
id: dep-check

The rebuild-cache condition for push events only gates on cases_py:

(github.event_name == 'push' &&
 needs.file-changes.outputs.cases_py == 'true')

If a PR that adds a use m_foo or #:include statement is merged without touching cases.py, the PR run correctly rebuilds the cache (via the artifact path), but the updated cache is not committed to master (that only happens on push with cases_py == 'true'). Subsequent PRs will use the stale master cache until a workflow_dispatch or a cases.py-touching push. This is a silent correctness gap, not a crash — the fallback is running the full suite — but worth documenting or addressing.

2. _normalize_cache mutates dict during iteration — coverage.py:83

for key, value in cache.items():
    if key == "_meta":
        continue
    if isinstance(value, dict):
        cache[key] = sorted(value.keys())   # mutates value while iterating keys
return cache

Mutating values (not keys) during Python 3 dict iteration is defined behavior and safe, but changing cache[key] in-place while iterating cache.items() is a code smell. Consider building a new dict: return {k: (sorted(v.keys()) if isinstance(v, dict) and k != "_meta" else v) for k, v in cache.items()}.

3. Conservative "no-sim-coverage" include may be broader than intended — coverage.py:~312

if changed_sim and not any(f.startswith("src/simulation/") for f in test_file_set):
    to_run.append(case)
    continue

The rationale ("cache build likely failed for this test") is sound — every MFC test runs the full pre/sim/post pipeline, so a test with zero simulation coverage indicates a failed gcov run. However, the unit test test_incomplete_coverage_included_conservatively confirms this behavior is intentional. Worth noting: if a significant fraction of tests fail during cache build (e.g., first-time cache on a flaky cluster), this heuristic could substantially reduce pruning effectiveness. A counter in the output showing how many tests were included via this path would help operators detect a degraded cache.

4. src/simulation/include/ not in ALWAYS_RUN_ALL_PREFIXEScoverage.py:~54

The macro files in src/common/include/ are all explicitly listed in ALWAYS_RUN_ALL. If src/simulation/include/ contains .fpp files that are #:include-d by simulation modules (e.g., simulation-specific inline helpers), changes to those files would be handled by the coverage intersection (which is correct if Fypp line markers map them correctly) — but if gcov attributes their lines to the including file rather than the included file, the cache would silently miss them. Given --line-marker-format=gfortran5 is set in CMake, Fypp should emit correct line markers for included files, so this is likely fine. Just worth verifying empirically with one of the simulation include files.

5. Dep-change grep pattern false-positive risk — test.yml

grep -qP '^\+\s*(use[\s,]+\w|#:include\s|include\s+['"'"'"])'

The pattern use[\s,]+\w would also match comment lines like + ! use this approach or variable names like + use_flag = .true. if they appear at the start of a diff line and the regex engine interprets use_flag as starting with use. In practice this is very unlikely (diff lines with +use_flag at column 1 would be unusual Fortran style), but a tighter pattern like use\s+\w (requiring whitespace after use) would be safer.


Minor / improvement opportunities

  • datetime import is unused in coverage.py — imported but no datetime calls appear in the diff.
  • gcno_files variable in build_coverage_cache is used only for diagnostics (count printing). The early-failure path from find_gcno_files raises a clear error if no .gcno files exist, which is the real value. The variable itself could be replaced with just calling find_gcno_files for the side-effect and using len(...) inline.
  • Phase 1 progress reporting counts futures completion order (non-deterministic with as_completed). The i counter reflects completion order, not submission order — fine for progress, but the "(i+1)/total" display might be confusing if users expect it to track wall-clock ordering.

Correctness verdict

The core coverage logic (filter_tests_by_coverage, _parse_gcov_json_output, get_changed_files, should_run_all_tests) is correct and well-tested. The GCOV_PREFIX_STRIP calculation (len(parts) - 1 to exclude the root /) correctly produces a strip count that maps .gcda files to test_gcda/build/.... The CMake changes (set(CMAKE_Fortran_FLAGS_RELEASE "-O1 -DNDEBUG") for gcov, disabling LTO and -march=native for gcov builds) are appropriate. The conservative behavior on cache miss / no-sim-coverage / git failure all correctly fall back to running more tests, not fewer.

The master-push dep-change gap (#1 above) is the only finding that could cause a committed stale cache to persist silently across merges. All other findings are low-severity.

@github-actions
Copy link

github-actions bot commented Mar 2, 2026

Claude Code Review

Head SHA: e696936
Files changed: 15

Changed files
  • .github/file-filter.yml
  • .github/workflows/frontier/test.sh
  • .github/workflows/frontier_amd/test.sh
  • .github/workflows/phoenix/rebuild-cache.sh
  • .github/workflows/phoenix/submit.sh
  • .github/workflows/phoenix/test.sh
  • .github/workflows/test.yml
  • .gitignore
  • CMakeLists.txt
  • src/simulation/m_bubbles.fpp
  • toolchain/mfc/cli/commands.py
  • toolchain/mfc/test/coverage.py
  • toolchain/mfc/test/test.py
  • toolchain/mfc/test/test_coverage_cache.json.gz
  • toolchain/mfc/test/test_coverage_unit.py

Summary

  • Adds a gcov-based test pruning system: pre-built coverage cache maps 555 test UUIDs → exercised .fpp files; --only-changes uses this to skip unaffected tests (~80% CI speedup claimed).
  • Cache rebuild is automated on cases.py changes or Fortran dependency-graph changes (new use/include lines); full suite is forced for GPU macro files, CMake, and key toolchain files.
  • Safety fallback: cache missing, stale, or git failure → full test suite runs.
  • 53 unit tests exercise the core filter/parse logic offline.
  • CMakeLists.txt gcov changes disable -march=native and LTO; Fypp gains --line-marker-format=gfortran5 for accurate source mapping.

Findings

Bug — Duplicate use m_helper_basic in src/simulation/m_bubbles.fpp

src/simulation/m_bubbles.fpp line 16–17 (diff)

    use m_helper_basic         !< Functions to compare floating point numbers

    use m_helper_basic         ← newly added duplicate

The PR adds a second use m_helper_basic statement immediately after the existing one. Most compilers accept duplicate use statements but it is unnecessary, confusing, and likely a rebase artifact. Remove the duplicate.


Design gap — Root CMakeLists.txt not in ALWAYS_RUN_ALL

toolchain/mfc/test/coverage.py lines ~506–524

ALWAYS_RUN_ALL_PREFIXES covers toolchain/cmake/ but the root CMakeLists.txt is not in ALWAYS_RUN_ALL and does not match the prefix. The unit test test_cmakelists_does_not_trigger_all explicitly encodes this as expected behavior.

A PR that changes only CMakeLists.txt (e.g., new compiler flag, new link option, new source glob) will have its tests pruned via coverage. Build-system changes can alter codegen in ways that CPU coverage cannot detect. Recommend adding CMakeLists.txt to ALWAYS_RUN_ALL.


Design gap — New .fpp files added to the repo skip all coverage-filtered tests

toolchain/mfc/test/coverage.py filter_tests_by_coverage (~line 1066)

When a PR adds a brand-new .fpp source file, no entry in the cache covers it yet (the cache predates the file). All existing tests have test_file_set & changed_fpp == ∅, so the conservative-for-UUID-not-in-cache path is never reached and every test is skipped. The unit test test_new_fpp_file_no_coverage_skips documents this as intentional.

This is safe only when the dep-change detector fires (a new use m_new_module line in another file triggers a cache rebuild that includes the new file). If the new file is wired in solely through CMakeLists.txt (no new use statement in existing files), the dep-change detector will not trigger and the coverage cache will silently skip all tests for a file that has never been covered. Worth a comment in the code or a guard.


Minor — Dep-change detection is addition-only

.github/workflows/test.yml line ~165

grep -qP '^\+\s*(use[\s,]+\w|#:include\s|include\s+['""])

@github-actions
Copy link

github-actions bot commented Mar 2, 2026

Claude Code Review

Head SHA: a77b967d0bfe59ac13331e5577c7b54b7636f2ff
Files changed: 15

Changed files
  • .github/file-filter.yml
  • .github/workflows/frontier/test.sh
  • .github/workflows/frontier_amd/test.sh
  • .github/workflows/phoenix/rebuild-cache.sh
  • .github/workflows/phoenix/submit.sh
  • .github/workflows/phoenix/test.sh
  • .github/workflows/test.yml
  • .gitignore
  • CMakeLists.txt
  • src/simulation/m_bubbles.fpp
  • toolchain/mfc/cli/commands.py
  • toolchain/mfc/test/coverage.py
  • toolchain/mfc/test/test.py
  • toolchain/mfc/test/test_coverage_cache.json.gz
  • toolchain/mfc/test/test_coverage_unit.py

Summary

  • Adds gcov-based file-level coverage cache (555 tests → covered .fpp file map) to prune test runs on PRs to only affected tests
  • Adds --only-changes and --build-coverage-cache CLI flags to the test command
  • Adds a rebuild-cache GitHub Actions job that rebuilds on cases.py changes or Fortran dependency graph changes
  • Conservative safety guards: GPU macro files, Fypp includes, and build system files always trigger the full suite
  • 53 unit tests for the coverage module with broad edge case coverage
  • Minor CMakeLists.txt improvements: skip -march=native and LTO for gcov builds

Findings

1. Security: fork PR can run untrusted code on the self-hosted Phoenix runner

File: .github/workflows/test.yml, around line ~200 (rebuild-cache job)

The rebuild-cache job checks out github.event.pull_request.head.sha and runs it on the gt (Phoenix) self-hosted runner with contents: write permission. For PRs from forks, this executes arbitrary contributor code on the organization's HPC cluster. This is a significant security concern and is the canonical GitHub Actions supply-chain risk for pull_request + self-hosted runners.

If forked PRs can trigger this path (i.e., the workflow runs in a pull_request context and the self-hosted runner accepts fork PRs), attackers could exfiltrate secrets or abuse cluster resources.

Mitigation options:

  • Add github.event.pull_request.head.repo.full_name == github.repository to the job if: condition to restrict cache rebuilds to non-fork PRs only.
  • Or use pull_request_target with explicit head-SHA pinning (but this has its own risks).
  • Or restrict rebuild-cache to push and workflow_dispatch events only, with PRs always falling back to the committed cache.

2. Correctness: --only-changes logic diverges from filter_tests_by_coverage on .f90 files

File: toolchain/mfc/test/test.py, coverage pruning block

In test.py, the early-exit path for "no .fpp changes" uses:

changed_fpp = {f for f in changed_files if f.endswith(".fpp")}
if not changed_fpp:
    ...skips all tests...

But filter_tests_by_coverage (in coverage.py) correctly includes .f90 files in its changed_fpp set. If a user modifies only src/common/m_precision_select.f90 (a real source file, not generated), test.py will incorrectly skip all tests while the underlying filter function would have run the right subset. The unit test test_f90_changes_trigger_matching_tests validates this works in filter_tests_by_coverage, but the early-exit guard in test.py will preempt it.

Fix: Use the same predicate in both places:

changed_fpp = {f for f in changed_files if f.endswith(".fpp") or f.endswith(".f90")}

3. Correctness: CMakeLists.txt not in ALWAYS_RUN_ALL but test claims it should not trigger

File: toolchain/mfc/test/test_coverage_unit.py, test_cmakelists_does_not_trigger_all

The unit test at line ~1487 explicitly asserts CMakeLists.txt does not trigger the full suite:

def test_cmakelists_does_not_trigger_all(self):
    assert should_run_all_tests({"CMakeLists.txt"}) is False

But CMakeLists.txt controls compilation flags (e.g., optimization levels, preprocessor defines like MFC_OpenACC). Changes to it can affect which code paths are compiled in and thus what tests are relevant — arguably it should trigger the full suite, or at minimum be placed in ALWAYS_RUN_ALL_PREFIXES. The comment in coverage.py notes that toolchain/cmake/ triggers all (prefix rule), but the top-level CMakeLists.txt is excluded. This may be intentional if CMake changes are assumed to be covered by CI build checks, but it is worth a deliberate decision comment.

4. Minor: _compute_gcov_prefix_strip off-by-one risk

File: toolchain/mfc/test/coverage.py, _compute_gcov_prefix_strip

return str(len(Path(real_root).parts) - 1)  # -1 excludes root '/'

The comment says "-1 excludes root /". On Linux, Path('/home/user/mfc').parts is ('/', 'home', 'user', 'mfc') (4 parts), so stripping 3 leaves home/user/mfc. The intent appears to be to strip the repo root prefix from compiled-in .gcda paths so the GCOV_PREFIX tree starts at build/. This is highly sensitive to the exact path structure and may break if MFC is installed at a path depth different from where the cache was built. Worth a brief integration test or a more robust approach (e.g., computing strip count from a known .gcno file's embedded path).

5. Minor: rebuild-cache job skips on lint failure but matrix tests proceed

File: .github/workflows/test.yml

The matrix test if: condition uses needs.rebuild-cache.result != 'cancelled', which allows the matrix to proceed even if rebuild-cache failed (result == 'failure'). This is correct by design (tests fall back to the committed cache), but if the rebuild failed mid-way and uploaded a partial artifact, the continue-on-error: true on the download step would silently use a corrupt cache. A stale-but-complete committed cache is preferred over a partial freshly-built one, and the current logic handles this correctly — the download only happens if: needs.rebuild-cache.result == 'success'. No bug here, but worth noting the flow is correct.


Improvement Opportunities

  • Explicit ALWAYS_RUN_ALL for .github/workflows/: Changes to CI workflow files themselves (.yml) are not in ALWAYS_RUN_ALL. A CI workflow change that adds a new test matrix entry or changes build flags would not trigger the full suite. Consider adding ".github/" as an ALWAYS_RUN_ALL_PREFIXES entry.
  • rebuild-cache.sh hardcodes gfortran CPU build: The cache is built with gfortran only. GPU-specific Fypp macro expansions differ by compiler and backend. The comment in ALWAYS_RUN_ALL (GPU macro files always trigger full suite) correctly compensates, but this dependency is implicit. A comment in rebuild-cache.sh explaining why GPU builds are excluded from cache generation would aid future maintainers.

🤖 Generated with Claude Code

@github-actions
Copy link

github-actions bot commented Mar 2, 2026

Claude Code Review

Head SHA: 499a3471f9d92d2ab6bda5d132c11468e179ffbb
Files changed: 15

Changed files
  • .github/file-filter.yml
  • .github/workflows/frontier/test.sh
  • .github/workflows/frontier_amd/test.sh
  • .github/workflows/phoenix/rebuild-cache.sh
  • .github/workflows/phoenix/submit.sh
  • .github/workflows/phoenix/test.sh
  • .github/workflows/test.yml
  • .gitignore
  • CMakeLists.txt
  • src/simulation/m_bubbles.fpp
  • toolchain/mfc/cli/commands.py
  • toolchain/mfc/test/coverage.py
  • toolchain/mfc/test/test.py
  • toolchain/mfc/test/test_coverage_cache.json.gz
  • toolchain/mfc/test/test_coverage_unit.py

Summary

  • Adds a gcov-based test pruning system: build MFC once with coverage instrumentation, record per-test file coverage, then skip unaffected tests on PRs.
  • Introduces --only-changes and --build-coverage-cache CLI flags for ./mfc.sh test.
  • Adds a rebuild-cache CI job on Phoenix triggered when cases.py or Fortran use/include dependencies change.
  • Safety guards (ALWAYS_RUN_ALL, ALWAYS_RUN_ALL_PREFIXES) force full suite for GPU macros, key toolchain files, and CMake helpers.
  • 53 unit tests for coverage logic with good edge-case coverage.

Findings

1. Bug — Duplicate in

File: src/simulation/m_bubbles.fpp, added at line +7 in the diff (two consecutive use m_helper_basic statements):

use m_helper_basic         !< Functions to compare floating point numbers
use m_helper_basic          ← duplicate added by this PR

This is a stray duplicate use statement. Most compilers accept it (with or without a warning) but it's incorrect and should be removed. gfortran emits a warning for this; nvfortran and Cray ftn may silently accept it, so it won't block CI, but it should be cleaned up.


2. Security — job executes fork PR code on a self-hosted runner

File: .github/workflows/test.yml, rebuild-cache job (~lines 179–233 in diff)

The job:

  1. Checks out the PR's head SHA (github.event.pull_request.head.sha) from a potentially external fork.
  2. Submits a SLURM job that builds and runs that code on the Phoenix HPC cluster.
  3. Holds permissions: contents: write to commit the resulting cache back to master.

For pull_request events from forks, GitHub withholds repository secrets, but the runner itself is still invoked and executes the checked-out code. A malicious PR could craft build scripts or Fortran source that executes arbitrary commands during the SLURM job. The contents: write permission then gives the resulting workflow job authority to push to master.

Mitigations to consider:

  • Gate the job on github.event.pull_request.head.repo.full_name == 'MFlowCode/MFC' (i.e., only for internal branches, not forks) — or restrict to pull_request_target with explicit fork allowlisting.
  • Alternatively, require a maintainer label before the cache-rebuild job runs.

If Phoenix runners are already restricted to repo members only (runner group ACLs), this risk is reduced but the workflow-level check is still good practice.


3. Logic gap — Root changes skip all tests

File: toolchain/mfc/test/coverage.py, ALWAYS_RUN_ALL and ALWAYS_RUN_ALL_PREFIXES

The unit test at test_coverage_unit.py explicitly confirms this:

def test_cmakelists_does_not_trigger_all(self):
    assert should_run_all_tests({"CMakeLists.txt"}) is False

Files under toolchain/cmake/ correctly trigger the full suite, but CMakeLists.txt at the repo root does not. A PR that only modifies CMakeLists.txt (e.g., changes a compile flag, adds an add_compile_options, or modifies linker flags) will result in zero tests running with --only-changes. Since this PR itself modifies CMakeLists.txt to add gcov options and disable -march=native, this scenario is realistic.

Consider adding "CMakeLists.txt" to ALWAYS_RUN_ALL, or adding "CMakeLists.txt" and similar build system roots to ALWAYS_RUN_ALL_PREFIXES (e.g., ("", "toolchain/cmake/") with a root-level glob).


4. Minor — used before conditional initialization

File: CMakeLists.txt, HANDLE_SOURCES macro usage (~line 400 in diff)

FYPP_GCOV_OPTS is only set inside the if (MFC_GCov) block within the GNU compiler branch. It is then passed unconditionally as ${FYPP_GCOV_OPTS} to every Fypp invocation. CMake silently expands unset variables as empty strings, so this works in practice, but an explicit:

set(FYPP_GCOV_OPTS )

before the compiler-detection block would make the intent clearer and prevent any surprise if the variable scope changes.


Improvement opportunities

  • The filter_tests_by_coverage conservative fallback for tests with no simulation coverage (lines 1112–1116 in coverage.py) is sound, but the heuristic ("no sim coverage recorded → include conservatively") would also catch tests that legitimately have no simulation phase (e.g., pre_process-only tests). Adding a comment clarifying the intent would help maintainability.
  • The phase-1 concurrency cap is hardcoded to 32 (min(n_jobs, 32)). A comment explaining why 32 specifically (MPI processes × ~500 MB each = OOM threshold on a given node) is already present — good. Worth also noting the assumed node memory in the comment for future reference.

- File-level gcov coverage cache maps 555 test UUIDs to exercised .fpp
  source files (gzip JSON, 11KB, committed to repo)
- --only-changes flag prunes tests by intersecting PR-changed files
  against coverage cache; --build-coverage-cache builds the cache
- New rebuild-cache CI job runs on Phoenix via SLURM when cases.py or
  Fortran dependency graph changes (on both PRs and master pushes)
- Dep-change detection greps PR/push diffs for added use/include
  statements that would invalidate the coverage cache
- Conservative fallbacks: missing cache runs all, missing sim coverage
  includes test, ALWAYS_RUN_ALL files trigger full suite
- Remove continue-on-error from github CI job (fixes auto-cancellation)
- TEMP: duplicate use in m_bubbles.fpp + remove CMakeLists.txt from
  ALWAYS_RUN_ALL to test the full cache rebuild pipeline in CI
- 53 unit tests cover core coverage logic

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@sbryngelson sbryngelson force-pushed the gcov-test-pruning-v2 branch from 499a347 to 8d58b93 Compare March 2, 2026 00:52
@github-actions
Copy link

github-actions bot commented Mar 2, 2026

Claude Code Review

Head SHA: 8d58b93
Files changed: 15

Changed files
  • .github/file-filter.yml
  • .github/workflows/frontier/test.sh
  • .github/workflows/frontier_amd/test.sh
  • .github/workflows/phoenix/rebuild-cache.sh (new)
  • .github/workflows/phoenix/submit.sh
  • .github/workflows/phoenix/test.sh
  • .github/workflows/test.yml
  • .gitignore
  • CMakeLists.txt
  • src/simulation/m_bubbles.fpp
  • toolchain/mfc/cli/commands.py
  • toolchain/mfc/test/coverage.py (new, 659 lines)
  • toolchain/mfc/test/test.py
  • toolchain/mfc/test/test_coverage_cache.json.gz (binary)
  • toolchain/mfc/test/test_coverage_unit.py (new, 655 lines)

Summary

  • Adds gcov-based file-level coverage cache (555 test UUIDs → exercised .fpp files) to enable selective test pruning on PRs (~80% CI time reduction claimed)
  • New --only-changes flag intersects changed files against the cache; --build-coverage-cache populates it
  • rebuild-cache CI job on Phoenix triggers when cases.py or Fortran dependency graph changes; commits result to master on push
  • Safety fallbacks: ALWAYS_RUN_ALL set for GPU macro/toolchain files; stale-cache detection via cases.py SHA256 hash; falls back to full suite on cache miss
  • 53 unit tests added for the coverage module

Findings

1. Bug: Duplicate use m_helper_basic in m_bubbles.fpp [Medium]

File: src/simulation/m_bubbles.fpp, lines 17–19 (in diff)

    use m_helper_basic         !< Functions to compare floating point numbers
+   use m_helper_basic         ← duplicate added by this PR

This appears to be an artifact of coverage-instrumented test runs (possibly a careless edit). While most compilers tolerate duplicate use statements (Fortran standard permits it), this adds noise, may trigger warnings with strict compilers, and could affect Fypp line-number accounting. Remove the duplicate.


2. CMakeLists.txt not in ALWAYS_RUN_ALL [Medium]

File: toolchain/mfc/test/coverage.py, lines 520–533

The ALWAYS_RUN_ALL set covers GPU macros and toolchain Python files but not CMakeLists.txt. The unit test test_cmakelists_does_not_trigger_all explicitly asserts this.

This PR itself changes CMakeLists.txt (adds -O1 override, disables -march=native for gcov, disables LTO). If a future PR changes compile flags, optimization levels, or linking options in CMakeLists.txt, the coverage cache won't trigger a full suite run — yet those changes could affect runtime behavior independently of source file coverage.

Suggest adding CMakeLists.txt to ALWAYS_RUN_ALL (or at least to ALWAYS_RUN_ALL_PREFIXES as a directory entry like "toolchain/cmake/"), and updating the corresponding unit test.


3. Dep-change detection misses use statement removals [Low]

File: .github/workflows/test.yml, lines 171–176

The grep pattern '^\+\s*(use[\s,]+\w|#:include\s|include\s+...)' only detects added lines (prefixed with +). Removing a use or #:include also changes the Fortran dependency graph and could make the coverage cache stale (a previously-exercised file is no longer compiled into a target, or a new code path is exposed). This is a heuristic limitation, but worth documenting in a comment.


4. --only-changes is applied after --only in __filter [Low]

File: toolchain/mfc/test/test.py, lines 108 → 156

If a user runs ./mfc.sh test --only Bubbles --only-changes, the --only filter narrows the list first, then --only-changes narrows it further. This is silently composable. The interaction should be mentioned in the help text or validated (e.g., warn that --only-changes overrides --only for skip decisions).


5. rebuild-cache job gate uses != 'cancelled' but not != 'failure' [Low]

File: .github/workflows/test.yml, lines 243–244

needs.rebuild-cache.result != 'cancelled'

When rebuild-cache is skipped (its if: condition is false), result is 'skipped', which satisfies != 'cancelled', so downstream jobs run — this is the intended fast-path. ✓

When rebuild-cache fails (SLURM error, build failure), result is 'failure', which also satisfies != 'cancelled', so downstream jobs still run with the old/downloaded cache (which the continue-on-error: true on the download step and stale-hash check handles). This appears intentional but is worth a comment explaining that failure is treated as a graceful fallback to the committed cache.


Improvement opportunities

  • The GCOV_PREFIX_STRIP calculation at coverage.py:687 uses len(Path(real_root).parts) - 1. On Linux /home/user/mfc has 4 parts so strip=3, which is correct. On paths with symlinks this may diverge from the os.path.realpath resolution used above — already handled by calling realpath first. No issue, just note for future maintainers.
  • The build_coverage_cache function rebuilds all unique binary slugs before running tests, but each per-test _run_single_test_direct call also checks if the binary exists and warns if not. Consider surfacing a clear error (not just warning) if zero binaries are found for a test, since silent skips here would silently corrupt the cache.

@sbryngelson sbryngelson closed this Mar 2, 2026
@sbryngelson sbryngelson deleted the gcov-test-pruning-v2 branch March 2, 2026 00:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

1 participant