[fix](executor) prevent BE crash when split process throws unexpectedly#62044
[fix](executor) prevent BE crash when split process throws unexpectedly#62044eldenmoon wants to merge 2 commits intoapache:masterfrom
Conversation
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
There was a problem hiding this comment.
Pull request overview
Prevents backend worker-thread termination (and potential BE crash) when split->process() throws unexpectedly in the time-sharing scan executor by converting thrown exceptions into split failure statuses.
Changes:
- Wrap
PrioritizedSplitRunner::process()invocation in a try/catch to prevent exceptions from escaping the dispatch thread. - Map
doris::Exception(including special-casingMEM_ALLOC_FAILED) and other exceptions to appropriateStatuserrors returned viaResult. - Keep
enable_thread_catch_bad_allocscoped aroundsplit->process()to preserve existing memory-exception behavior.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
be/src/exec/scan/task_executor/time_sharing/time_sharing_task_executor.cpp
Outdated
Show resolved
Hide resolved
| auto blocked_future_result = [&]() -> Result<SharedListenableFuture<Void>> { | ||
| try { | ||
| doris::enable_thread_catch_bad_alloc++; | ||
| Defer defer {[&]() { doris::enable_thread_catch_bad_alloc--; }}; | ||
| return split->process(); | ||
| } catch (const doris::Exception& e) { | ||
| if (e.code() == doris::ErrorCode::MEM_ALLOC_FAILED) { | ||
| return unexpected(Status::MemoryLimitExceeded( | ||
| "PreCatch error code:{}, {}, __FILE__:{}, __LINE__:{}, " | ||
| "__FUNCTION__:{}", | ||
| e.code(), e.to_string(), __FILE__, __LINE__, __PRETTY_FUNCTION__)); | ||
| } | ||
| return unexpected(e.to_status()); | ||
| } catch (const std::exception& e) { |
There was a problem hiding this comment.
This try/catch block duplicates the exception-to-Status mapping logic already implemented in common/exception.h (including the enable_thread_catch_bad_alloc guard and the "PreCatch" message). To avoid future divergence, consider factoring the conversion into a shared helper (e.g., a function that converts an Exception/std::exception to Status) and reuse it here for the Result<...> error path.
Catch exceptions around split->process() in TimeSharingTaskExecutor and convert them to split failure status. This avoids worker thread termination and BE crash for cases : ``` erminate called after throwing an instance of 'doris::Exception' what(): [E-7412] assert cast err:[E-7412] Bad cast from ... doris::vectorized::ScannerSplitRunner::process_for(std::chrono::duration<long, std::ratio<1l, 1000000000l> >) at /home/zcp/repo_center/doris_release/doris/be/src/vec/exec/scan/scanner_scheduler.cpp:420 10# doris::vectorized::PrioritizedSplitRunner::process() at /home/zcp/repo_center/doris_release/doris/be/src/vec/exec/executor/time_sharing/prioritized_split_runner.cpp:104 11# doris::vectorized::TimeSharingTaskExecutor::_dispatch_thread() at /home/zcp/repo_center/doris_release/doris/be/src/vec/exec/executor/time_sharing/time_sharing_task_executor.cpp:568 12# ``` w , while keeping MEM_ALLOC_FAILED mapped to MemoryLimitExceeded.
c1429c7 to
a67425a
Compare
…executor.cpp Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
|
/review |
|
run buildall |
TPC-H: Total hot run time: 29125 ms |
TPC-DS: Total hot run time: 179999 ms |
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
|
skip check_coverage |
Catch exceptions around split->process() in TimeSharingTaskExecutor and
convert them to split failure status.
This avoids worker thread termination and BE crash for cases :
, while keeping MEM_ALLOC_FAILED mapped to
MemoryLimitExceeded.