Fix severe memory leak during HMC thermalization and skip sweeps#1
Fix severe memory leak during HMC thermalization and skip sweeps#1ThinkOffApp wants to merge 1 commit intomainfrom
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 1436606cea
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| save_checkpoint("thermalization", i + 1, meas_start) | ||
|
|
||
| # Free memory aggressively during thermalization sweeps | ||
| clear_gpt_caches() |
There was a problem hiding this comment.
Stop invalidating stencil caches on every sweep
Calling clear_gpt_caches() inside the thermalization loop clears default_staple_cache/default_exp_cache before the next one_sweep, so GPT must rebuild cached transport/stencil kernels on every sweep instead of reusing them (gpt/lib/gpt/qcd/gauge/stencil/staple.py:37-56, gpt/lib/gpt/core/foundation/lattice/matrix/exp.py:188-203). In long runs with large ntherm, this turns a one-time setup into per-sweep overhead and can drastically slow experiments; cache clearing should be batched less frequently rather than done every iteration.
Useful? React with 👍 / 👎.
This PR fixes a critical memory leak in the su2_2q_signal_scan.py script. The C++ lattice objects allocated via the action staple within the multihit/skip sweeps were not being garbage collected during the long thermalization loops, leading to OOM crashes. The PR actively calls the clear_gpt_caches() function and gc.collect() inside the thermalization and skip sweep loops to maintain a flat memory profile.