Add gcov-based test pruning with file-level coverage cache#1273
Add gcov-based test pruning with file-level coverage cache#1273sbryngelson wants to merge 21 commits intoMFlowCode:masterfrom
Conversation
There was a problem hiding this comment.
Pull request overview
This PR replaces the regex/JSON-based test dependency mapping with a gcov-driven, line-level coverage cache to support ./mfc.sh test --only-changes test pruning, and wires the feature into the CLI and CI workflows.
Changes:
- Add
toolchain/mfc/test/coverage.pyto build/load a gzipped per-test line coverage cache and intersect it withgit diff -U0line ranges. - Integrate
--only-changes,--changes-branch, and--build-coverage-cacheinto the test runner and CLI. - Update CMake Fypp invocation and CI workflows to support
.fppline mapping and PR-only pruning.
Reviewed changes
Copilot reviewed 12 out of 14 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
toolchain/mfc/test/coverage.py |
Implements coverage cache build/load, gcov JSON parsing, diff parsing, and test filtering. |
toolchain/mfc/test/test.py |
Adds --only-changes filtering flow and --build-coverage-cache entrypoint. |
toolchain/mfc/test/test_coverage_unit.py |
Adds offline unit tests for diff parsing, gcov parsing, and filtering logic. |
toolchain/mfc/cli/commands.py |
Exposes new CLI flags and examples for coverage-based pruning. |
CMakeLists.txt |
Adds Fypp line marker format to improve gcov↔.fpp line mapping. |
.github/workflows/test.yml |
Ensures full git history and enables PR-only --only-changes invocation. |
.github/workflows/phoenix/{submit.sh,test.sh} |
Forwards PR context into SLURM jobs and conditionally enables --only-changes. |
.github/workflows/frontier/{submit.sh,test.sh} |
Same PR-context forwarding and conditional --only-changes. |
.github/workflows/frontier_amd/{submit.sh,test.sh} |
Same PR-context forwarding and conditional --only-changes. |
.gitignore |
Ignores legacy raw .json cache while keeping .json.gz trackable. |
586b0c5 to
8ecbd21
Compare
…ve dead code - Report non-zero exit codes from post_process (and other targets) during cache build instead of silently swallowing them - Gate --line-marker-format=gfortran5 behind MFC_GCov so it only applies to gcov instrumented builds - Remove unused collect_coverage_from_gcda function Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
GCC 12.3 with LTO on Granite Rapids generates AVX-512 FP16 instructions (vmovw) that the system assembler (binutils 2.35) cannot encode. Disabling LTO for gcov builds avoids this and is correct since gcov instrumentation at -O1 does not benefit from link-time optimization. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
GCC 12.3 with -O3 -march=native on Granite Rapids emits AVX-512 FP16 instructions (vmovw) that binutils 2.35 cannot assemble. For gcov builds, force -O1 via CMAKE_Fortran_FLAGS_RELEASE and skip -march=native entirely. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Remove _gcda_path_to_fpp (dead code, never called in production) - Remove 7 associated unit tests (TestGcdaPathToFpp) - Add set -e to rebuild-cache.sh for error propagation - Add github.repository guard to commit-cache job Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Fix help text to say file-level (not line-level) throughout commands.py - Allow tests to proceed when rebuild-cache fails (not just skipped) - Fix _run_single_test_direct docstring return type (3-tuple not 2-tuple) - Add cache integrity validation: reject zero-coverage cache builds - Remove bare except in _parse_gcov_json_output (let unexpected errors propagate) - Log warnings in _prepare_test instead of silently swallowing exceptions - Fail early with clear error when no MPI launcher found Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Replace bare except Exception with (SubprocessError, OSError) in collect_coverage_for_test to let unexpected errors propagate - Add fork PR guard to commit-cache job so it skips for external PRs Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add --only-changes flag to all CI test commands (Github, Phoenix, Frontier, Frontier AMD). When the coverage cache is available, only tests exercising changed .fpp files will run. Falls back to full suite when cache is missing or stale. Also fetch master ref in Github CI for merge-base diff, and add origin/master fallback in get_changed_files for shallow clones. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
0c4bbd4 to
b47f748
Compare
Claude Code ReviewHead SHA: b47f748 Files changed: 14
Summary
Findings1.
|
This is a temporary commit to verify --only-changes works end-to-end. Will be reverted before merge. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Claude Code ReviewHead SHA: da7aa27 Changed files
Summary:
Findings1. BLOCKER —
|
git fetch --depth=1 alone doesn't provide enough history for merge-base to find a common ancestor. Add --deepen=200 to expand the shallow clone so --only-changes can compute the correct diff. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Claude Code ReviewHead SHA: a398834
Summary
Findings1.
|
--only-changes must not run on master pushes: merge-base would diff master against itself, find no changes, and skip all tests. Gate behind GITHUB_EVENT_NAME == pull_request in all test scripts. Add retry loop (3 attempts) for git push in commit-cache to handle concurrent updates from simultaneous rebuild-cache completions. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Removes one word from a comment to verify --only-changes runs ~38 bubble tests and skips the rest. Will be reverted before merge. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Claude Code ReviewHead SHA: Files changed: 15
Summary
Findings[CRITICAL]
|
Re-enable CMakeLists.txt in ALWAYS_RUN_ALL and revert the trivial comment change in m_bubbles.fpp. Both were temporary for CI testing. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Claude Code ReviewHead SHA: 240ef0a Files
Summary
Findings1.
|
…nd missing binary warning - case.fpp and toolchain/cmake/ are invisible to gcov (Fypp-inlined or non-Fortran) but affect all tests. coverage.py itself must trigger a full run to validate pruning logic changes. - Fix commit-cache retry: pushed=true was emitted even when all 3 push attempts failed. - Warn when a target binary is missing during cache build so cache quality issues are visible. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Claude Code ReviewHead SHA: 9d90188
Summary
Findings1.
|
Add Fortran dependency graph change detection (use/include grep) to trigger cache rebuilds. Temporarily disable CMakeLists.txt from ALWAYS_RUN_ALL and add a benign duplicate use statement to verify the dep-change pipeline works end-to-end. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Claude Code ReviewHead SHA: Files changed: 15
Summary:
FindingsCritical — Must fix before merge1. Debug artifact left in production Fortran source
This is a duplicate 2.
# "CMakeLists.txt", # TEMP: disabled to test dep-change cache rebuild triggerThis was deliberately commented out for testing. With Logic gap — Dep-change detection only covers additions, not removals
grep -qP '^\+\s*(use[\s,]+\w|#:include\s|include\s+['"'"'"])The regex only matches lines beginning with Minor — Commit message hardcoded to "cases.py changed"
git commit -m "Regenerate gcov coverage cache
Automatically rebuilt because cases.py changed."The Observation — Three test plan items still uncheckedThe PR description lists three items as not yet validated in CI:
These are the primary correctness guarantees of the cache invalidation mechanism. These should be verified or explained before merging. Overall: The design is solid, the conservative fallbacks are well-considered, and the unit test suite is thorough. The two critical issues (TEMP |
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Claude Code ReviewHead SHA: Files
Summary
Findings🔴 Critical — Must fix before merge1. Duplicate use m_helper_basic !< Functions to compare floating point numbers
use m_helper_basic !< TEMP: test dep-change cache rebuild triggerThis is an explicit debugging artifact used to trigger the dep-change cache rebuild CI path. It must be removed before merge — it adds a redundant import to production Fortran code, and the comment makes clear it was never intended to stay. 2. ALWAYS_RUN_ALL = frozenset([
...
"toolchain/mfc/test/coverage.py",
# "CMakeLists.txt", # TEMP: disabled to test dep-change cache rebuild trigger
])This was commented out temporarily to test the dep-change trigger. With it absent, changing This also causes a direct unit test failure: Fix: Uncomment the 🟡 Moderate3. ${FYPP_GCOV_OPTS}This variable is only set inside 4. --body "Coverage cache auto-updated: a bot commit was pushed to this branch because `cases.py` changed."The cache is also rebuilt when Fortran 🟢 Minor / Observations5. Unquoted In 6. New In VerdictThe core design is solid and the fallback behavior is appropriately conservative. The two TEMP artifacts are blocking issues — remove the duplicate |
The self-hosted Phoenix runner retains build artifacts across jobs. A prior --gpu mp build leaves CMake flags (e.g. -foffload=amdgcn-amdhsa) that cause gfortran to fail when building with --gcov. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Claude Code ReviewHead SHA: d5b0b36
Summary
Findings🔴 Must Fix — Debug Artifact Left in Source
+ use m_helper_basic !< TEMP: test dep-change cache rebuild triggerThis 🔴 Must Fix — Unit Test Will Fail (
|
Summary
Add
--only-changesflag to./mfc.sh testthat uses a gcov-based file-level coverage cache to skip tests unaffected by a PR's source changes. CI automatically uses--only-changeson PRs and rebuilds the cache when needed.How it works
Cache build (
./mfc.sh build --gcov -j 8 && ./mfc.sh test --build-coverage-cache -j 64):.inpfiles for all 555 tests.fppsource files it exercisesTest filtering (
./mfc.sh test --only-changes -j 8):git diffidentifies changed.fppfiles vs merge-baseCI integration (
.github/workflows/test.yml):--only-changeson PRs (Github runners, Phoenix, Frontier, Frontier AMD)--only-changes)rebuild-cachejob runs on Phoenix when triggered (PR only, not forks)commit-cachejob auto-commits the updated cache back to the PR branchCache rebuild triggers — the cache is rebuilt when either:
cases.pychanges (new/modified test definitions)use,#:include, orincludestatements detected via diff grep)Safety guards
ALWAYS_RUN_ALL: GPU macro files,CMakeLists.txt,case.fpp,toolchain/cmake/,coverage.py, toolchain definitions → full suite--only-changesis gated onGITHUB_EVENT_NAME == pull_requestin all CI scripts; master pushes run full suite--deepen=200for reliablegit merge-base; falls back toorigin/masterif neededrebuild-cacheruns butcommit-cacheis skipped (can't push to fork)set -einrebuild-cache.shprevents committing a cache from a failed buildCMake changes for gcov builds (
--gcovflag)CMAKE_Fortran_FLAGS_RELEASEto-O1 -DNDEBUG(coverage is inaccurate at -O3)-march=native(AVX-512 FP16 on GNR emits instructions binutils 2.35 can't assemble)--line-marker-format=gfortran5Fypp flag behindMFC_GCovCI-verified results
.fppchanges.fppfile changed (m_bubbles.fpp)-% 20CMakeLists.txtchangedFiles changed
toolchain/mfc/test/coverage.pytoolchain/mfc/test/test_coverage_unit.pytoolchain/mfc/test/test_coverage_cache.json.gztoolchain/mfc/test/test.py--only-changesand--build-coverage-cachetoolchain/mfc/cli/commands.py.github/workflows/test.ymlrebuild-cache,commit-cachejobs;--only-changesin test jobs; dep-change detection.github/workflows/phoenix/rebuild-cache.shset -e).github/workflows/phoenix/test.sh--only-changeson PRs, 64-thread parallel cap.github/workflows/frontier/test.sh--only-changeson PRs.github/workflows/frontier_amd/test.sh--only-changeson PRs.github/workflows/phoenix/submit.shCMakeLists.txt.github/file-filter.ymlcases_pypath filter.gitignoreTest plan
./mfc.sh precheck -j 8passes./mfc.sh lint) — 77 total including other test suites--gcovrebuild-cacheskips whencases.pyunchanged and no dep changesrebuild-cacheis skipped--only-changescorrectly prunes tests (118/555 for single.fppchange)-% 20sampling after pruning (23/118)--only-changes)rebuild-cachetriggers oncases.pychangerebuild-cachetriggers on Fortran dependency change (use/include)commit-cachecommits updated cache back to PR branch🤖 Generated with Claude Code