[fix](exec) Fix data race on _queue_deps and _parents in ExchangeSinkBuffer#62040
Draft
Mryange wants to merge 1 commit intoapache:masterfrom
Draft
[fix](exec) Fix data race on _queue_deps and _parents in ExchangeSinkBuffer#62040Mryange wants to merge 1 commit intoapache:masterfrom
Mryange wants to merge 1 commit intoapache:masterfrom
Conversation
Contributor
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
Contributor
Author
|
run buildall |
TPC-H: Total hot run time: 29318 ms |
TPC-DS: Total hot run time: 178990 ms |
Contributor
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
Contributor
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What problem does this PR solve?
Problem Summary:
ExchangeSinkBufferhas two member fields_queue_depsand_parentsthat are protected by mutex_m. However, multiple code paths access these fields without holding_m, leading to data races.Specifically,
add_block(),_send_rpc(), and_set_receiver_eof()iterate_queue_depswhile only holding the per-RpcInstancemutex (instance_data.mutex), and_turn_off_channel()iterates_parentsalso without holding_m. Concurrently,set_dependency()acquires_mand pushes to both_queue_depsand_parents. Since the two lock domains are disjoint, if a new sink registers viaset_dependency()while another thread is iterating_queue_deps, thestd::vectormay be reallocated underneath the iterator, causing use-after-free or corrupted iteration. This is undefined behavior.Beyond the data race on the vector itself, there is also a missed-wakeup problem: the check-then-act pattern on
_total_queue_size(anstd::atomic<int>) versus the_queue_depsblock/unblock iteration is not atomic. Two threads holding different instance mutexes can interleave such that one thread blocks all dependencies right after another thread has already checked and decided to unblock, resulting in a permanently blocked pipeline (deadlock).Approach
This PR introduces Clang's
-Wthread-safetystatic analysis onexchange_sink_buffer.cppto mechanically detect all unprotected accesses, then fixes all reported violations.1. Thread Safety Annotation Infrastructure
Added
be/src/common/thread_safety_annotations.hwhich provides:TSA_GUARDED_BY,TSA_REQUIRES,TSA_ACQUIRE,TSA_RELEASE, etc. — standard Clang thread safety annotation macrosAnnotatedMutex— astd::mutexwrapper annotated withTSA_CAPABILITY("mutex")AnnotatedLockGuard<MutexType>— an RAII lock guard annotated withTSA_SCOPED_CAPABILITYThese are no-ops when compiling with non-Clang compilers.
2. Annotations Applied to ExchangeSinkBuffer
In
exchange_sink_buffer.h:std::mutex _m→AnnotatedMutex _m_queue_depsmarked withTSA_GUARDED_BY(_m)_parentsmarked withTSA_GUARDED_BY(_m)set_dependency()usesAnnotatedLockGuardinstead ofstd::lock_guardIn
be/src/exec/CMakeLists.txt:set_source_files_properties(... PROPERTIES COMPILE_FLAGS "-Wthread-safety")forexchange_sink_buffer.cpp, so the analysis is enabled per-file without affecting the rest of the build.3. Compilation Errors Detected
With
-Wthread-safetyenabled, Clang reported 8 errors at 4 locations:add_block()_queue_depsdep->block()— only holdsinstance_data.mutex, not_m_send_rpc()_queue_depsdep->set_ready()— only holdsinstance_data.mutex, not_m_set_receiver_eof()_queue_depsdep->set_ready()— holds no lock at all_turn_off_channel()_parentsparent->on_channel_finished()— holdsinstance_data.mutex, not_m4. Fixes Applied
All four sites now acquire
_m(viaAnnotatedLockGuard l(_m)) before iterating the guarded fields:add_block(): Wrapped the_total_queue_size > _queue_capacitycheck and_queue_depsiteration inAnnotatedLockGuard l(_m)._send_rpc(): Wrapped the_total_queue_size <= _queue_capacitycheck and_queue_depsiteration inAnnotatedLockGuard l(_m)._set_receiver_eof(): Wrapped the_total_queue_size <= _queue_capacitycheck and_queue_depsiteration inAnnotatedLockGuard l(_m)._turn_off_channel(): Wrapped the_parentsiteration inAnnotatedLockGuard l(_m).Lock ordering is always
instance_data.mutex→_m(the inner lock), so there is no deadlock risk.After applying the fixes, the file compiles cleanly with zero thread-safety violations.
Check List (For Author)
Test
Behavior changed:
Does this need documentation?
Check List (For Reviewer who merge this PR)