Skip to content

[Bug](function) fix wrong result of window funnel v2 + deduplication mode#62043

Open
BiteTheDDDDt wants to merge 2 commits intoapache:masterfrom
BiteTheDDDDt:fix_0402_2
Open

[Bug](function) fix wrong result of window funnel v2 + deduplication mode#62043
BiteTheDDDDt wants to merge 2 commits intoapache:masterfrom
BiteTheDDDDt:fix_0402_2

Conversation

@BiteTheDDDDt
Copy link
Copy Markdown
Contributor

This pull request fixes a bug in the window_funnel_v2 aggregate function's DEDUPLICATION mode, where chains were incorrectly broken when a single row matched multiple events (multi-match rows). The update ensures that only true duplicates from different rows break the chain, aligning the behavior with the previous version (V1). The changes also include comprehensive regression tests to verify the correct handling of both multi-match rows and true duplicates.

Bug Fixes in Deduplication Logic:

  • Updated the deduplication logic in WindowFunnelStateV2 (aggregate_function_window_funnel_v2.h) to skip breaking the chain when a "duplicate" event is actually from the same row as an event already in the chain, preventing premature chain termination on multi-match rows. [1] [2]
  • Added a new helper method _is_same_row_as_chain to check if an event is from the same row as any event in the current chain, used to distinguish true duplicates from multi-match rows.

Testing Improvements:

  • Added two new unit tests in vec_window_funnel_v2_test.cpp:
    • testDeduplicationSameRowMultiEvent verifies that multi-match rows do not break the chain in DEDUPLICATION mode.
    • testDeduplicationTrueDuplicateStillBreaks ensures that a true duplicate on a different row still breaks the chain as expected.

Regression Test Suite Updates:

  • Added regression tests in window_funnel_v2.groovy and updated expected outputs in window_funnel_v2.out to cover the fixed scenarios for DEDUPLICATION mode, ensuring both multi-match and true duplicate behaviors are validated. [1] [2]

Copilot AI review requested due to automatic review settings April 2, 2026 07:26
@hello-stephen
Copy link
Copy Markdown
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@BiteTheDDDDt
Copy link
Copy Markdown
Contributor Author

run buildall

@BiteTheDDDDt
Copy link
Copy Markdown
Contributor Author

/review

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes window_funnel_v2 DEDUPLICATION mode so that “duplicates” originating from the same physical row (multi-match rows) no longer incorrectly break the funnel chain, aligning behavior with V1.

Changes:

  • Update WindowFunnelStateV2 deduplication logic to skip chain termination when the duplicate is from the same row as an existing chain event (via _is_same_row_as_chain).
  • Add new BE unit tests covering (1) same-row multi-match non-breaking behavior and (2) true-duplicate different-row breaking behavior.
  • Add/extend regression tests and expected outputs for the fixed deduplication scenarios.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File Description
be/src/exprs/aggregate/aggregate_function_window_funnel_v2.h Adjusts DEDUPLICATION handling to distinguish same-row multi-match events from true duplicates.
be/test/exprs/aggregate/vec_window_funnel_v2_test.cpp Adds unit tests for the deduplication fix (same-row multi-match + true duplicate).
regression-test/suites/query_p0/aggregate/window_funnel_v2.groovy Adds regression SQL cases targeting the deduplication bug scenario and a true-duplicate scenario.
regression-test/data/query_p0/aggregate/window_funnel_v2.out Updates expected outputs for the newly added regression queries.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +1167 to +1172
// Test DEDUPLICATION mode: same-row multi-event matching should NOT break the chain.
// This is the exact scenario from the bug report where V1 returns 3 but V2 returned 2.
//
// 3 conditions, window=1 day (86400s), DEDUPLICATION mode
// Row 0 (t=10:00): event1=T, event2=T, event3=F → matches E0+E1 on same row
// Row 1 (t=11:00): event1=T, event2=T, event3=F → matches E0+E1 on same row
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The file opens namespace doris { near the top (line 41), but it is no longer closed at the end of the file after adding these tests. This will break compilation due to an unclosed namespace. Add the corresponding } // namespace doris at the end of the file (or otherwise ensure namespaces are correctly balanced).

Copilot uses AI. Check for mistakes.
Comment on lines +507 to +513
// Row 0 (t=10:00): xwhat=1 → matches E0 (xwhat=1) and E1 (xwhat!=2)
// Row 1 (t=11:00): xwhat=1 → matches E0 (xwhat=1) and E1 (xwhat!=2)
// Row 2 (t=12:00): xwhat=3 → matches E0 (xwhat=1 false, but see below), E1 (xwhat!=2) and E2 (xwhat=3)
// Using conditions: xwhat=1, xwhat!=2, xwhat=3
// Row 0: E0=T(1=1), E1=T(1!=2), E2=F(1!=3) → same-row E0+E1
// Row 1: E0=T(1=1), E1=T(1!=2), E2=F(1!=3) → same-row E0+E1
// Row 2: E0=F(3!=1), E1=T(3!=2), E2=T(3=3) → same-row E1+E2
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The inline explanation for Row 2 is internally inconsistent/misleading: it says the row "matches E0" even though xwhat=3 does not satisfy the first condition (xwhat = 1). Consider rewriting this comment to avoid suggesting E0 matches on Row 2, so the test case description stays accurate for future readers.

Copilot uses AI. Check for mistakes.
@BiteTheDDDDt
Copy link
Copy Markdown
Contributor Author

run buildall

@hello-stephen
Copy link
Copy Markdown
Contributor

BE Regression && UT Coverage Report

Increment line coverage 86.67% (13/15) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 73.64% (27315/37093)
Line Coverage 57.24% (294074/513746)
Region Coverage 54.51% (245163/449778)
Branch Coverage 56.20% (106249/189070)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants