[Bug](pipeline) do not set shared hash table signaled when task terminated#62056
[Bug](pipeline) do not set shared hash table signaled when task terminated#62056BiteTheDDDDt wants to merge 4 commits intoapache:masterfrom
Conversation
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
|
run buildall |
There was a problem hiding this comment.
Pull request overview
This PR tightens the broadcast/shared hash join build-sink’s early-termination behavior so non-builder tasks won’t treat an unbuilt (monostate) hash table as ready, avoiding a crash during shared-hash-table copying.
Changes:
- In
HashJoinBuildSinkLocalState::close(), set_signaled=trueonly when the builder task was not terminated early. - Clarify the non-builder
sink()guard semantics around_signaledvs_terminatedto prevent unsafe access to the hash table variant.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
|
run buildall |
|
/review |
There was a problem hiding this comment.
Findings
- Missing regression coverage for the shared-hash-table early-termination race. This patch changes the handshake between
PipelineTask::execute()wake-up handling and the non-builderHashJoinBuildSinkOperatorX::sink()EOF guard, but I could not find any new test that exercisesenable_share_hash_table_for_broadcast_join=truewith task 0 terminated early. The existing regression test (regression-test/suites/query_p0/join/test_terminate.groovy) only covers a shuffle join, and the current BE unit tests forhashjoin_build_sinkonly exercisetask_idx == 0, so the exact race fixed here is still uncovered.
Critical Checkpoints
- Goal of the task: The patch aims to stop non-builder broadcast-join sink tasks from observing
_signaled=truewhen task 0 was woken up early before building the shared hash table. The code change appears to achieve that, but there is no new test proving the target interleaving. - Small, clear, focused modification: Yes. The code change is narrow and limited to the shared-hash-table signal/EOF path.
- Concurrency: Applicable. I checked the thread interaction between
PipelineTask::execute(),terminate(),close(), and the non-buildersink()path. I did not find a new deadlock or obvious data race in the modified logic, and keeping_signaledas an atomic is important here. - Special lifecycle management: Applicable. The new comment about
terminate()running from thePipelineTaskexecute defer beforeclose()is consistent with the current lifecycle. - Configuration items: Not applicable. No new config was added.
- Incompatible changes: Not applicable. No API, protocol, or storage compatibility change is introduced.
- Parallel code paths: Applicable. The change is intentionally limited to the shared-hash-table broadcast-join path; I did not see another equivalent path that also needed the same code change.
- Special conditional checks: Applicable. The new
!_signaledguard is justified, and the updated comments explain why_terminatedis not the right runtime guard insidesink(). - Test coverage: Insufficient. A targeted BE unit test or regression test is still needed for the shared-hash-table early-terminate race.
- Observability: Not applicable for this small fix; existing error context is acceptable.
- Transaction / persistence: Not applicable.
- Data writes / modifications: Not applicable.
- FE-BE variable passing: Not applicable.
- Performance: No obvious regression; the added branch is negligible.
- Other issues: I did not find an additional correctness bug in the patched logic beyond the missing coverage above.
Overall opinion: the fix direction looks correct, but it should be accompanied by a regression/unit test for the exact broadcast shared-hash-table race that motivated the patch.
| // At this point, termination is reflected solely through the value of _signaled: a | ||
| // terminated builder never sets _signaled to true. Checking !_signaled is therefore | ||
| // sufficient and serves as the real guard against racing with an uninitialized hash table. | ||
| if (!_signaled) { |
There was a problem hiding this comment.
Please add coverage for this interleaving. The bug only reproduces when enable_share_hash_table_for_broadcast_join=true, task 0 is woken up early, and another build task reaches this !_signaled guard before trying to reuse the shared hash table. I could not find a new test for that path: regression-test/suites/query_p0/join/test_terminate.groovy exercises a shuffle join, and the existing hashjoin_build_sink BE tests only run task_idx == 0. Since this is a pipeline/kernel race fix, AGENTS.md requires a focused regression or unit test here so the crash does not silently come back in a later refactor.
shared hash table + broadcast join有问题的时序:
#61768 在这个pr中,我们调整了先set所有task为wake up early,再set所有task ready的顺序,所以引入了这个问题
This pull request improves the safety and clarity of the hash join build sink operator's handling of early termination and hash table signaling. The changes ensure that non-builder tasks do not attempt to access an uninitialized hash table, preventing potential crashes due to type mismatches.
Hash join build signaling and termination handling:
HashJoinBuildSinkLocalState::close, the builder task now sets_signaled = trueonly if it was not terminated early, ensuring that non-builder tasks do not attempt to access an uninitialized hash table. Additional comments clarify the logic and the relationship between_terminatedand_signaled.HashJoinBuildSinkOperatorX::sink, updated the logic and comments to clarify that non-builder tasks will return EOF if the builder has not finished or was terminated early, relying on the improved_signaledguard to prevent unsafe access to the hash table.