IBM Feature Improvements and Speedup#1157
IBM Feature Improvements and Speedup#1157danieljvickers wants to merge 80 commits intoMFlowCode:masterfrom
Conversation
…mething about that...
…generating golden file
…nment, even in CPU tests on frontier
Take master's compiler_flag detection (for frontier_amd support) and the PR's dynamic mode selection (gpu → "g", cpu → "c"). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
Hey @danieljvickers — I went through the code closely and found three bugs that should be fixed before merge. The rest of the PR looks good structurally (the MPI IB marker exchange removal is a valid simplification since each rank now independently computes markers over its full local domain including ghost cells). Bugs (must fix)Bug 1:
|
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #1157 +/- ##
==========================================
+ Coverage 44.04% 44.94% +0.90%
==========================================
Files 70 70
Lines 20499 20498 -1
Branches 1993 1943 -50
==========================================
+ Hits 9028 9213 +185
+ Misses 10330 10162 -168
+ Partials 1141 1123 -18 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Claude Code ReviewHead SHA: Summary
Findings🔴 Critical1. The print statements are correctly guarded by #endif
error stop "Ghost Point and Image Point on Different Processors" ! ← always presentThis is inside a 🟠 High2. Raw allocate(ma%trs_v(1:3, 1:3, 1:ma%ntrs))
allocate(ma%trs_n(1:3, 1:ma%ntrs))Per project rules, every 3. if (p == 0 .and. k == 0) cycle ! line ~332
...
if (p == 0) then ! line ~353
fraction = real(nInOrOut)/18._wp
4. Removed MPI halo exchange ( The old code called 🟡 Medium5. $:GPU_PARALLEL_LOOP(private='[i,j]', copy='[temp_boundary_v,edge_occurrence]', collapse=2)The 6. After the first loop sets 7. if (present(zp_lower) .and. p /= 0) then
...
end ifIf 🔵 Minor / Nit
Review covers 26 changed files (+1662 / −1615 lines). IBM test suite is confirmed to pass on NVHPC GPU. |
Claude Code ReviewHead SHA: 1883b9b Files
Summary
Findings1.
|
Claude Code ReviewHead SHA: Files changed: 26 File list
Summary
Findings1. Forbidden — CRITICALFile:
2. Named from inner loop inside — HIGHFile: A named 3. raised to 1000 inflates IC patch arrays in all three executables — MEDIUMFile:
Unless 4. Unused variables and in — MEDIUMFile: These are leftovers from a commented-out fibonacci-sphere ray-direction generator. They are never assigned or read, producing compiler warnings on gfortran/ifx (and potential errors with 5. without name for — MINORFile: All other subroutines in the file use 6. leaves / uninitialised when and z args are passed — MINORFile: In practice the 3D call path is guarded by Positive observations
🤖 Generated with Claude Code |
Claude Code ReviewHead SHA: 53ebfb7 Changed files
Summary
Findings1. Global if (p == 0 .and. k == 0) cycle
...
if (p == 0) then
fraction = real(nInOrOut)/18._wp
else
fraction = real(nInOrOut)/26._wp
end if
2. Raw allocate(ma%trs_v(1:3, 1:3, 1:ma%ntrs))
allocate(ma%trs_n(1:3, 1:ma%ntrs))and allocate(stl_bounding_boxes(num_ibs, 1:3, 1:3))Per project rules, every 3. Precision violation: bare bbox_min = 1e12
bbox_max = -1e12These are default-real (likely single-precision) literals. Should be 4. Cylindrical grid conversion silently removed from STL levelset — , - if (grid_geometry == 3) then
- xyz_local = f_convert_cyl_to_cart(xyz_local)
- end ifThis block was removed with no comment or guard. Any STL IB used with 5. 3D airfoil bounding-box z-search uses chord length instead of span — , call get_bounding_indices(center(3) - ca_in, center(3) + ca_in, z_cc, ll, lr)
6. Unused variable after interpolation removal — , line ~1345 logical :: interpolate !< Logical variable to determine whether or not the model should be interpolatedAll uses of 7. Potential division by zero in In the dist = sqrt((point(1) - v1(1))**2 + (point(2) - v1(2))**2)
norm = norm/dist ! <-- div-by-zero if point == v1and the Minor / Suggestions
|
Claude Code ReviewHead SHA: 03e65ed Files changed: 36 Key files:
Summary
Findings1. Forbidden — must fix (CLAUDE.md rule)File:
2. Missing
|
Claude Code ReviewHead SHA: 2ace4c0 Summary
Findings🔴 Critical — Must Fix1. if (bounds_error) error stop "Ghost Point and Image Point on Different Processors. Exiting"
@:PROHIBIT(bounds_error, "Ghost Point and Image Point on Different Processors.")or 2.
Additionally, 🟠 High — Should Fix3. Four occurrences in bbox_min = 1e12 ! should be 1.e12_wp
bbox_max = -1e12 ! should be -1.e12_wpDefault-kind float literals violate precision discipline. Also in fraction = real(nInOrOut)/18._wp ! real(nInOrOut) → default real; should be real(nInOrOut, wp)
fraction = real(nInOrOut)/26._wp4. Cylinder bounding box uses full axial length instead of half-length — corner_distance = sqrt(radius**2 + maxval(length)**2)
Fix: 5.
6. Data race on The GPU parallel loop simultaneously reads 🟡 Medium7. Unguarded print *, " * Reading model: "//trim(patch_ib(patch_id)%model_filepath)Not guarded by 🔵 Low / Informational8. The new 9. Module-private subroutine lacks the 10. Frontier/Frontier-AMD The change from hardcoded Required Before Merge
|
User description
Description
Following the refactor of the levelset, there were several performance optimizations left to be made to the code. This PR introduces optimizations that will make multi-particle MIBM code viable. It also expands the upper bound of allowed number of immersed boundaries to 1000. Performance was measured on 1-4 ranks of ACC GPU compute using A100 GPUs.
This PR has extended optimization to STL IBs, which should significantly improve accuracy, performance, and code cleanliness. The primary optimizations are as follows:
Type of change
Testing
All changes pass the IBM section of the test suite on GPUs with the NVHPC compiler. Performance was measured with a case of 1000 particles with viscosity enabled. The particles are all resolved 3D spheres given random non-overlapping positions generated by the following case file:
These optimizations add nearly x1000 performance in the moving IBM propagation and generation code. Prior to these optimizations, this was the result of the benchmark case using the NVIDIA NSight profiler showing 45 seconds to run a single RK substep:
Following these optimizations, the same profile achieves almost 50 ms per RK substep:

For STLs, the optimizations were tested on a 822,000 vertex mesh of a Mach 0.4 corgi, given by this STL:
https://www.thingiverse.com/thing:4721563/files
The final simulation finished in a total of 25 minutes on a 200^3 grid for 4k time steps on a single A100 GPU. All of the code related to the STL model (file reading, preprocessing, IB marker generation, and levelset compute) took only 20 seconds of the run time. The result of that simulation can be viewed here:
https://www.youtube.com/watch?v=h44BNCKo0Hs
Checklist
See the developer guide for full coding standards.
GPU changes (expand if you modified
src/simulation/)CodeAnt-AI Description
GPU-accelerate STL immersed-boundary compute and support up to 1000 IBs
What Changed
Impact
✅ Faster IB marker generation✅ Lower CPU during IB setup and levelset evaluation✅ Support up to 1000 immersed boundaries💡 Usage Guide
Checking Your Pull Request
Every time you make a pull request, our system automatically looks through it. We check for security issues, mistakes in how you're setting up your infrastructure, and common code problems. We do this to make sure your changes are solid and won't cause any trouble later.
Talking to CodeAnt AI
Got a question or need a hand with something in your pull request? You can easily get in touch with CodeAnt AI right here. Just type the following in a comment on your pull request, and replace "Your question here" with whatever you want to ask:
This lets you have a chat with CodeAnt AI about your pull request, making it easier to understand and improve your code.
Example
Preserve Org Learnings with CodeAnt
You can record team preferences so CodeAnt AI applies them in future reviews. Reply directly to the specific CodeAnt AI suggestion (in the same thread) and replace "Your feedback here" with your input:
This helps CodeAnt AI learn and adapt to your team's coding style and standards.
Example
Retrigger review
Ask CodeAnt AI to review the PR again, by typing:
Check Your Repository Health
To analyze the health of your code repository, visit our dashboard at https://app.codeant.ai. This tool helps you identify potential issues and areas for improvement in your codebase, ensuring your repository maintains high standards of code health.