[Bug](scan) Preserve IN_LIST runtime filter predicates when key range is a scope range by BiteTheDDDDt · Pull Request #62027 · apache/doris

BiteTheDDDDt · 2026-04-02T02:07:28Z

This pull request addresses a bug in the OLAP scan operator where IN_LIST predicates could be incorrectly erased when both MINMAX and IN runtime filters targeted the same key column, and the number of IN values exceeded the maximum allowed for pushdown. The changes ensure that IN_LIST predicates are preserved in such cases, preventing incorrect query results. Additionally, a regression test is added to verify the fix.

Bug fix in predicate handling:

Modified the logic in _build_key_ranges_and_filters() within olap_scan_operator.cpp to ensure that IN_LIST predicates are not erased when the key range is a scope range (e.g., >= X AND <= Y) and the IN filter's value count exceeds max_pushdown_conditions_per_column. This preserves filtering semantics that are not captured by the scope range. [1] [2] [3]
Enhanced the profiling output in _process_conjuncts() to accurately reflect the set of predicates that will reach the storage layer after key range and filter construction. This helps with debugging and verification of predicate pushdown.

Testing and regression coverage:

Added a new regression test test_rf_in_list_not_erased_by_scope_range.groovy to verify that IN_LIST predicates are not incorrectly erased when both MINMAX and IN filters are present and the IN list is too large to be absorbed into the key range.
Added the corresponding expected output file for the new regression test.

…s a scope range ### What problem does this PR solve? Issue Number: close #xxx Related PR: #xxx Problem Summary: Commit d7153c0 ("[fix](predicate) Avoid recomputing predicates (apache#60484)") introduced a regression where IN_LIST predicates from IN_FILTER runtime filters are incorrectly erased during key range construction in _build_key_ranges_and_filters(). When both MINMAX and IN runtime filters target the same key column, and the IN filter value count exceeds max_pushdown_conditions_per_column, the IN values are not absorbed into the ColumnValueRange (which remains a scope range from MINMAX). However, the erase logic treated the scope range as "exact" and removed all could_be_erased() predicates including the IN_LIST predicate. This caused the IN_LIST to never reach the storage layer for row-level filtering, resulting in significant performance degradation. The fix preserves IN_LIST predicates when the ColumnValueRange is a scope range (not a fixed value range), since scope ranges do not capture IN_LIST filtering semantics. Additionally, the PushDownPredicates profile string is now re-printed after _build_key_ranges_and_filters() to accurately reflect which predicates actually reach the storage layer. ### Release note Fixed a performance regression where IN_LIST runtime filter predicates were incorrectly erased when a MINMAX runtime filter created a scope range on the same key column, causing the IN filter to have no effect on row filtering. ### Check List (For Author) - Test: Regression test - Regression test: test_rf_in_list_not_erased_by_scope_range - Behavior changed: No - Does this need documentation: No

hello-stephen · 2026-04-02T02:07:34Z

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

What problem was fixed (it's best to include specific error reporting information). How it was fixed.
Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
What features were added. Why was this function added?
Which code was refactored and why was this part of the code refactored?
Which functions were optimized and what is the difference before and after the optimization?

Copilot

Pull request overview

Fixes a BE predicate-pushdown bug in the OLAP scan operator where IN_LIST runtime-filter predicates could be incorrectly erased when a key column’s ColumnValueRange is a scope range (e.g. from MINMAX RF), leading to incorrect filtering semantics. Adds a regression test intended to validate the scenario.

Changes:

Preserve IN_LIST column predicates when the key range is derived from a non-fixed (scope) ColumnValueRange in _build_key_ranges_and_filters().
Update profiling output to reflect the final predicate set after key range construction potentially erases predicates.
Add a new regression test suite + expected output for the reported runtime-filter interaction.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File	Description
`be/src/exec/operator/olap_scan_operator.cpp`	Adjusts key-range/predicate erasure logic to retain `IN_LIST` predicates for scope ranges; re-prints profile predicate info after erasure.
`regression-test/suites/correctness_p0/test_rf_in_list_not_erased_by_scope_range.groovy`	New regression test case targeting the reported runtime-filter + pushdown-limit scenario.
`regression-test/data/correctness_p0/test_rf_in_list_not_erased_by_scope_range.out`	Expected output for the new regression test.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-02T02:12:54Z

regression-test/suites/correctness_p0/test_rf_in_list_not_erased_by_scope_range.groovy

+
+    // The join should only return 6 rows (matching k1 in {2,4,6,8,10,12}).
+    // If the IN_LIST predicate is incorrectly erased, the MINMAX scope [2,12]
+    // would let through rows with k1 in {3,5,7,9,11} as well, producing wrong results.
+    // We verify correctness by checking the result.
+    order_qt_join """
+        SELECT p.k1, p.v1, b.k1, b.v1
+        FROM rf_scope_probe p
+        INNER JOIN rf_scope_build b ON p.k1 = b.k1


This regression currently uses an INNER JOIN, so the result set is determined by the join condition (p.k1=b.k1) even if the runtime filter predicates are missing/erased; that means the test likely won’t fail under the bug described. To make it sensitive to an over-permissive runtime filter (e.g., MINMAX scope without the IN_LIST), consider using a LEFT SEMI JOIN / EXISTS pattern and enabling enable_left_semi_direct_return_opt (as other runtime-filter correctness tests do), so false positives like k1=3,5,7,9,11 would surface in the output when IN_LIST is erased.

Suggested change

// The join should only return 6 rows (matching k1 in {2,4,6,8,10,12}).

// If the IN_LIST predicate is incorrectly erased, the MINMAX scope [2,12]

// would let through rows with k1 in {3,5,7,9,11} as well, producing wrong results.

// We verify correctness by checking the result.

order_qt_join """

SELECT p.k1, p.v1, b.k1, b.v1

FROM rf_scope_probe p

INNER JOIN rf_scope_build b ON p.k1 = b.k1

sql "set enable_left_semi_direct_return_opt = true;"

// The join should only return 6 rows (matching k1 in {2,4,6,8,10,12}).

// If the IN_LIST predicate is incorrectly erased, the MINMAX scope [2,12]

// would let through rows with k1 in {3,5,7,9,11} as well, producing wrong results.

// We verify correctness by checking the result.

order_qt_join """

SELECT p.k1, p.v1

FROM rf_scope_probe p

WHERE EXISTS (

SELECT 1

FROM rf_scope_build b

WHERE p.k1 = b.k1

)

Copilot · 2026-04-02T02:12:55Z

regression-test/suites/correctness_p0/test_rf_in_list_not_erased_by_scope_range.groovy

+    // Use both IN and MIN_MAX runtime filter types so both are generated on the join key.
+    sql "set runtime_filter_type = 'IN_OR_BLOOM_FILTER,MIN_MAX';"
+    sql "set runtime_filter_wait_time_ms = 10000;"
+    sql "set runtime_filter_wait_infinitely = true;"


runtime_filter_wait_infinitely = true makes the effective wait time fall back to the query execution timeout (see RuntimeFilterConsumer), so the preceding runtime_filter_wait_time_ms = 10000 no longer bounds the wait and the test could stall much longer than intended if a runtime filter is not produced. Consider setting runtime_filter_wait_infinitely=false (or removing it) if the goal is to cap waits at 10s, or otherwise update the comments to reflect the longer wait behavior.

Suggested change

sql "set runtime_filter_wait_infinitely = true;"

// Cap runtime filter wait at 10s for this test instead of falling back to the query timeout.

sql "set runtime_filter_wait_infinitely = false;"

Copilot · 2026-04-02T02:12:55Z

be/src/exec/operator/olap_scan_operator.cpp

+        fmt::memory_buffer buf;
+        for (const auto& [slot_id, predicates] : _slot_id_to_predicates) {
+            if (predicates.empty()) {
+                continue;
+            }
+            fmt::format_to(buf, "Slot ID: {}: [", slot_id);
+            for (const auto& predicate : predicates) {
+                fmt::format_to(buf, "{{{}}}, ", predicate->debug_string());
+            }
+            fmt::format_to(buf, "] ");
+        }
+        custom_profile()->add_info_string("PushDownPredicates", fmt::to_string(buf));


This block duplicates the formatting logic already implemented in ScanLocalState (see predicates_to_string in scan_operator.cpp). Consider factoring that helper into a shared header/utility so predicate formatting stays consistent across scan operators and avoids future drift when the format needs to change.

Suggested change

fmt::memory_buffer buf;

for (const auto& [slot_id, predicates] : _slot_id_to_predicates) {

if (predicates.empty()) {

continue;

}

fmt::format_to(buf, "Slot ID: {}: [", slot_id);

for (const auto& predicate : predicates) {

fmt::format_to(buf, "{{{}}}, ", predicate->debug_string());

}

fmt::format_to(buf, "] ");

}

custom_profile()->add_info_string("PushDownPredicates", fmt::to_string(buf));

custom_profile()->add_info_string(

"PushDownPredicates", predicates_to_string(_slot_id_to_predicates));

BiteTheDDDDt · 2026-04-02T03:12:04Z

run buildall

BiteTheDDDDt · 2026-04-02T03:58:03Z

run buildall

BiteTheDDDDt · 2026-04-02T04:16:34Z

run buildall

BiteTheDDDDt · 2026-04-02T04:59:33Z

/review

BiteTheDDDDt · 2026-04-02T04:59:37Z

run buildall

github-actions · 2026-04-02T05:10:57Z

No issues found in review. I checked the scan-key/predicate rewrite, the runtime-filter end-to-end path, and the added regression coverage; the change looks correct and appropriately scoped.

HappenLee · 2026-04-02T09:52:04Z

be/src/exec/operator/olap_scan_operator.cpp

+                    // Comparison predicates (EQ/LT/LE/GT/GE): subsumed by both
+                    // fixed value and scope ranges.
+                    if (PredicateTypeTraits::is_comparison(pt)) {
+                        return true;


line 1007 check should after the PredicateTypeTraits::is_comparison(pt)

HappenLee · 2026-04-02T10:17:45Z

be/src/exec/operator/olap_scan_operator.cpp

+                        (pt == PredicateType::NOT_IN_LIST && opposite)) {
+                        return is_fixed_value_range;
+                    }
+                    // Everything else (BF, BITMAP, NOT_IN_LIST, IS_NULL, etc.): keep.


is null should be prune

hello-stephen · 2026-04-02T10:24:36Z

BE Regression && UT Coverage Report

Increment line coverage 100.00% (37/37) 🎉

Increment coverage report
Complete coverage report

Category	Coverage
Function Coverage	73.61% (27297/37085)
Line Coverage	57.25% (294080/513698)
Region Coverage	54.53% (245233/449739)
Branch Coverage	56.19% (106218/189044)

BiteTheDDDDt · 2026-04-02T12:01:27Z

run buildall

hello-stephen · 2026-04-02T13:20:38Z

BE UT Coverage Report

Increment line coverage 🎉

Increment coverage report
Complete coverage report

Category	Coverage
Function Coverage	52.94% (20053/37882)
Line Coverage	36.53% (188306/515511)
Region Coverage	32.83% (146363/445854)
Branch Coverage	33.95% (64042/188625)

Copilot AI review requested due to automatic review settings April 2, 2026 02:07

BiteTheDDDDt changed the title ~~[fix](be) Preserve IN_LIST runtime filter predicates when key range is a scope range~~ [Bug](scan) Preserve IN_LIST runtime filter predicates when key range is a scope range Apr 2, 2026

Copilot started reviewing on behalf of BiteTheDDDDt April 2, 2026 02:08 View session

Copilot AI reviewed Apr 2, 2026

View reviewed changes

adjust log

f926d04

fix compile

0fbfcb0

update case

7aa139f

update

bf66a95

BiteTheDDDDt added dev/4.1.x dev/4.0.x labels Apr 2, 2026

HappenLee reviewed Apr 2, 2026

View reviewed changes

update

72ed503

	sql "set runtime_filter_wait_infinitely = true;"
	// Cap runtime filter wait at 10s for this test instead of falling back to the query timeout.
	sql "set runtime_filter_wait_infinitely = false;"

Conversation

BiteTheDDDDt commented Apr 2, 2026

Uh oh!

hello-stephen commented Apr 2, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

BiteTheDDDDt commented Apr 2, 2026

Uh oh!

BiteTheDDDDt commented Apr 2, 2026

Uh oh!

BiteTheDDDDt commented Apr 2, 2026

Uh oh!

BiteTheDDDDt commented Apr 2, 2026

Uh oh!

BiteTheDDDDt commented Apr 2, 2026

Uh oh!

github-actions bot commented Apr 2, 2026

Uh oh!

HappenLee Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

HappenLee Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

hello-stephen commented Apr 2, 2026

BE Regression && UT Coverage Report

Uh oh!

BiteTheDDDDt commented Apr 2, 2026

Uh oh!

hello-stephen commented Apr 2, 2026

BE UT Coverage Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants