linux/efa: Fix config probe race with file-based completion#369
Open
mjonuschat wants to merge 1 commit intoamzn:masterfrom
Open
linux/efa: Fix config probe race with file-based completion#369mjonuschat wants to merge 1 commit intoamzn:masterfrom
mjonuschat wants to merge 1 commit intoamzn:masterfrom
Conversation
The config probing system launches compile_conftest.sh processes in the background via nohup in runbg.sh and tracks them by polling /proc/$pid. The nohup wrapper exits before the actual compile and config.h write completes, causing the PID to disappear from /proc prematurely. On machines with high core counts this results in config.h missing a significant number of defines. Replace PID polling with file-based completion signaling. Each background probe now touches a done-file on completion, and the waiter polls for that file instead. Config probes continue to run in parallel. Rename wait_for_pid to wait_for_completion and pids to pending throughout to reflect the new mechanism.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Issue #, if available: #364
Description of changes:
Fix config probe race condition by replacing PID polling with file-based completion signaling
The EFA driver's cmake-based config probing launches
compile_conftest.shprocesses in the backgroundvia
nohupinrunbg.shand tracks them by polling/proc/$pid. Thenohupwrapper exits beforethe actual compile and
config.hwrite completes, causing the PID to disappear from/procprematurely. On machines with high core counts this results in
config.hmissing a significant numberof defines, leading to build failures or a misconfigured driver module.
Replace PID polling with file-based completion signaling. Each background probe now touches a
done-file on completion, and the waiter polls for that file instead. Config probes continue to run in
parallel. Renamed
wait_for_pidtowait_for_completionandpidstopendingthroughout toreflect the new mechanism.
Testing
Built against kernel 6.16 on a 72-CPU machine. A known-good baseline (57 defines) was established using a synchronous build. The fix was then verified over 10 consecutive builds, all producing identical sorted
config.houtput (10/10PASS). PID polling consistently produced only 30-33 out of 57 defines under the same conditions.
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.