[Bug](pipeline) do not set shared hash table signaled when task terminated by BiteTheDDDDt · Pull Request #62056 · apache/doris

BiteTheDDDDt · 2026-04-02T12:29:38Z

shared hash table + broadcast join有问题的时序：

thread A: 触发make all runable
thread A: make all runable运行到唤醒task0
thread B: task0 terminate and close -> signal other task
thread C: task1被唤醒，正常执行sink -> 发现没有hash table -> 类型对不上 ->报错
thread A: make all runable 继续运行

#61768 在这个pr中，我们调整了先set所有task为wake up early，再set所有task ready的顺序，所以引入了这个问题

This pull request improves the safety and clarity of the hash join build sink operator's handling of early termination and hash table signaling. The changes ensure that non-builder tasks do not attempt to access an uninitialized hash table, preventing potential crashes due to type mismatches.

Hash join build signaling and termination handling:

In HashJoinBuildSinkLocalState::close, the builder task now sets _signaled = true only if it was not terminated early, ensuring that non-builder tasks do not attempt to access an uninitialized hash table. Additional comments clarify the logic and the relationship between _terminated and _signaled.
In HashJoinBuildSinkOperatorX::sink, updated the logic and comments to clarify that non-builder tasks will return EOF if the builder has not finished or was terminated early, relying on the improved _signaled guard to prevent unsafe access to the hash table.

hello-stephen · 2026-04-02T12:29:48Z

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

What problem was fixed (it's best to include specific error reporting information). How it was fixed.
Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
What features were added. Why was this function added?
Which code was refactored and why was this part of the code refactored?
Which functions were optimized and what is the difference before and after the optimization?

BiteTheDDDDt · 2026-04-02T12:29:53Z

run buildall

Copilot

Pull request overview

This PR tightens the broadcast/shared hash join build-sink’s early-termination behavior so non-builder tasks won’t treat an unbuilt (monostate) hash table as ready, avoiding a crash during shared-hash-table copying.

Changes:

In HashJoinBuildSinkLocalState::close(), set _signaled=true only when the builder task was not terminated early.
Clarify the non-builder sink() guard semantics around _signaled vs _terminated to prevent unsafe access to the hash table variant.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

be/src/exec/operator/hashjoin_build_sink.cpp

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

BiteTheDDDDt · 2026-04-02T12:44:01Z

run buildall

BiteTheDDDDt · 2026-04-02T12:44:09Z

/review

github-actions

Findings

Missing regression coverage for the shared-hash-table early-termination race. This patch changes the handshake between PipelineTask::execute() wake-up handling and the non-builder HashJoinBuildSinkOperatorX::sink() EOF guard, but I could not find any new test that exercises enable_share_hash_table_for_broadcast_join=true with task 0 terminated early. The existing regression test (regression-test/suites/query_p0/join/test_terminate.groovy) only covers a shuffle join, and the current BE unit tests for hashjoin_build_sink only exercise task_idx == 0, so the exact race fixed here is still uncovered.

Critical Checkpoints

Goal of the task: The patch aims to stop non-builder broadcast-join sink tasks from observing _signaled=true when task 0 was woken up early before building the shared hash table. The code change appears to achieve that, but there is no new test proving the target interleaving.
Small, clear, focused modification: Yes. The code change is narrow and limited to the shared-hash-table signal/EOF path.
Concurrency: Applicable. I checked the thread interaction between PipelineTask::execute(), terminate(), close(), and the non-builder sink() path. I did not find a new deadlock or obvious data race in the modified logic, and keeping _signaled as an atomic is important here.
Special lifecycle management: Applicable. The new comment about terminate() running from the PipelineTask execute defer before close() is consistent with the current lifecycle.
Configuration items: Not applicable. No new config was added.
Incompatible changes: Not applicable. No API, protocol, or storage compatibility change is introduced.
Parallel code paths: Applicable. The change is intentionally limited to the shared-hash-table broadcast-join path; I did not see another equivalent path that also needed the same code change.
Special conditional checks: Applicable. The new !_signaled guard is justified, and the updated comments explain why _terminated is not the right runtime guard inside sink().
Test coverage: Insufficient. A targeted BE unit test or regression test is still needed for the shared-hash-table early-terminate race.
Observability: Not applicable for this small fix; existing error context is acceptable.
Transaction / persistence: Not applicable.
Data writes / modifications: Not applicable.
FE-BE variable passing: Not applicable.
Performance: No obvious regression; the added branch is negligible.
Other issues: I did not find an additional correctness bug in the patched logic beyond the missing coverage above.

Overall opinion: the fix direction looks correct, but it should be accompanied by a regression/unit test for the exact broadcast shared-hash-table race that motivated the patch.

github-actions · 2026-04-02T12:50:16Z

be/src/exec/operator/hashjoin_build_sink.cpp

+        // At this point, termination is reflected solely through the value of _signaled: a
+        // terminated builder never sets _signaled to true. Checking !_signaled is therefore
+        // sufficient and serves as the real guard against racing with an uninitialized hash table.
+        if (!_signaled) {


Please add coverage for this interleaving. The bug only reproduces when enable_share_hash_table_for_broadcast_join=true, task 0 is woken up early, and another build task reaches this !_signaled guard before trying to reuse the shared hash table. I could not find a new test for that path: regression-test/suites/query_p0/join/test_terminate.groovy exercises a shuffle join, and the existing hashjoin_build_sink BE tests only run task_idx == 0. Since this is a pipeline/kernel race fix, AGENTS.md requires a focused regression or unit test here so the crash does not silently come back in a later refactor.

do not set shared hash table signaled when task terminated

79a32e1

Copilot AI review requested due to automatic review settings April 2, 2026 12:29

Copilot started reviewing on behalf of BiteTheDDDDt April 2, 2026 12:30 View session

Copilot AI reviewed Apr 2, 2026

View reviewed changes

be/src/exec/operator/hashjoin_build_sink.cpp Outdated Show resolved Hide resolved

be/src/exec/operator/hashjoin_build_sink.cpp Outdated Show resolved Hide resolved

be/src/exec/operator/hashjoin_build_sink.cpp Outdated Show resolved Hide resolved

BiteTheDDDDt and others added 3 commits April 2, 2026 20:41

Update be/src/exec/operator/hashjoin_build_sink.cpp

119a501

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Update be/src/exec/operator/hashjoin_build_sink.cpp

4417319

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Update be/src/exec/operator/hashjoin_build_sink.cpp

1e5afa2

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

github-actions bot reviewed Apr 2, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug](pipeline) do not set shared hash table signaled when task terminated#62056

[Bug](pipeline) do not set shared hash table signaled when task terminated#62056
BiteTheDDDDt wants to merge 4 commits intoapache:masterfrom
BiteTheDDDDt:fix_0402_3

BiteTheDDDDt commented Apr 2, 2026 •

edited

Loading

Uh oh!

hello-stephen commented Apr 2, 2026

Uh oh!

BiteTheDDDDt commented Apr 2, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

BiteTheDDDDt commented Apr 2, 2026

Uh oh!

BiteTheDDDDt commented Apr 2, 2026

Uh oh!

github-actions bot left a comment

Uh oh!

github-actions bot Apr 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

BiteTheDDDDt commented Apr 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hello-stephen commented Apr 2, 2026

Uh oh!

BiteTheDDDDt commented Apr 2, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

BiteTheDDDDt commented Apr 2, 2026

Uh oh!

BiteTheDDDDt commented Apr 2, 2026

Uh oh!

github-actions bot left a comment

Choose a reason for hiding this comment

Findings

Critical Checkpoints

Uh oh!

github-actions bot Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

BiteTheDDDDt commented Apr 2, 2026 •

edited

Loading