[Bug](function) fix wrong result of window funnel v2 + deduplication mode#62043
[Bug](function) fix wrong result of window funnel v2 + deduplication mode#62043BiteTheDDDDt wants to merge 2 commits intoapache:masterfrom
Conversation
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
|
run buildall |
|
/review |
There was a problem hiding this comment.
Pull request overview
Fixes window_funnel_v2 DEDUPLICATION mode so that “duplicates” originating from the same physical row (multi-match rows) no longer incorrectly break the funnel chain, aligning behavior with V1.
Changes:
- Update
WindowFunnelStateV2deduplication logic to skip chain termination when the duplicate is from the same row as an existing chain event (via_is_same_row_as_chain). - Add new BE unit tests covering (1) same-row multi-match non-breaking behavior and (2) true-duplicate different-row breaking behavior.
- Add/extend regression tests and expected outputs for the fixed deduplication scenarios.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| be/src/exprs/aggregate/aggregate_function_window_funnel_v2.h | Adjusts DEDUPLICATION handling to distinguish same-row multi-match events from true duplicates. |
| be/test/exprs/aggregate/vec_window_funnel_v2_test.cpp | Adds unit tests for the deduplication fix (same-row multi-match + true duplicate). |
| regression-test/suites/query_p0/aggregate/window_funnel_v2.groovy | Adds regression SQL cases targeting the deduplication bug scenario and a true-duplicate scenario. |
| regression-test/data/query_p0/aggregate/window_funnel_v2.out | Updates expected outputs for the newly added regression queries. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // Test DEDUPLICATION mode: same-row multi-event matching should NOT break the chain. | ||
| // This is the exact scenario from the bug report where V1 returns 3 but V2 returned 2. | ||
| // | ||
| // 3 conditions, window=1 day (86400s), DEDUPLICATION mode | ||
| // Row 0 (t=10:00): event1=T, event2=T, event3=F → matches E0+E1 on same row | ||
| // Row 1 (t=11:00): event1=T, event2=T, event3=F → matches E0+E1 on same row |
There was a problem hiding this comment.
The file opens namespace doris { near the top (line 41), but it is no longer closed at the end of the file after adding these tests. This will break compilation due to an unclosed namespace. Add the corresponding } // namespace doris at the end of the file (or otherwise ensure namespaces are correctly balanced).
| // Row 0 (t=10:00): xwhat=1 → matches E0 (xwhat=1) and E1 (xwhat!=2) | ||
| // Row 1 (t=11:00): xwhat=1 → matches E0 (xwhat=1) and E1 (xwhat!=2) | ||
| // Row 2 (t=12:00): xwhat=3 → matches E0 (xwhat=1 false, but see below), E1 (xwhat!=2) and E2 (xwhat=3) | ||
| // Using conditions: xwhat=1, xwhat!=2, xwhat=3 | ||
| // Row 0: E0=T(1=1), E1=T(1!=2), E2=F(1!=3) → same-row E0+E1 | ||
| // Row 1: E0=T(1=1), E1=T(1!=2), E2=F(1!=3) → same-row E0+E1 | ||
| // Row 2: E0=F(3!=1), E1=T(3!=2), E2=T(3=3) → same-row E1+E2 |
There was a problem hiding this comment.
The inline explanation for Row 2 is internally inconsistent/misleading: it says the row "matches E0" even though xwhat=3 does not satisfy the first condition (xwhat = 1). Consider rewriting this comment to avoid suggesting E0 matches on Row 2, so the test case description stays accurate for future readers.
|
run buildall |
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
This pull request fixes a bug in the
window_funnel_v2aggregate function's DEDUPLICATION mode, where chains were incorrectly broken when a single row matched multiple events (multi-match rows). The update ensures that only true duplicates from different rows break the chain, aligning the behavior with the previous version (V1). The changes also include comprehensive regression tests to verify the correct handling of both multi-match rows and true duplicates.Bug Fixes in Deduplication Logic:
WindowFunnelStateV2(aggregate_function_window_funnel_v2.h) to skip breaking the chain when a "duplicate" event is actually from the same row as an event already in the chain, preventing premature chain termination on multi-match rows. [1] [2]_is_same_row_as_chainto check if an event is from the same row as any event in the current chain, used to distinguish true duplicates from multi-match rows.Testing Improvements:
vec_window_funnel_v2_test.cpp:testDeduplicationSameRowMultiEventverifies that multi-match rows do not break the chain in DEDUPLICATION mode.testDeduplicationTrueDuplicateStillBreaksensures that a true duplicate on a different row still breaks the chain as expected.Regression Test Suite Updates:
window_funnel_v2.groovyand updated expected outputs inwindow_funnel_v2.outto cover the fixed scenarios for DEDUPLICATION mode, ensuring both multi-match and true duplicate behaviors are validated. [1] [2]