[Enhancement](mv-rewrite) Replace naive set-containment residual#61345
[Enhancement](mv-rewrite) Replace naive set-containment residual#61345foxtail463 wants to merge 2 commits intoapache:masterfrom
Conversation
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
|
run buildall |
TPC-H: Total hot run time: 26490 ms |
TPC-DS: Total hot run time: 168937 ms |
FE UT Coverage ReportIncrement line coverage |
|
/review |
There was a problem hiding this comment.
One correctness issue found.
fe/fe-core/src/main/java/org/apache/doris/nereids/rules/exploration/mv/Predicates.java:impliesByDnf()still reduces implication to exact predicate-set containment inside each DNF branch. That means the new logic does not handle stronger-but-non-identical predicates, e.g. query residualid > 15 OR (score = 1 AND id = 5)against view residualid > 10 OR (score = 1 AND id = 5). The first query branch should imply the first view branch, butsourceBranch.containsAll(targetBranch)returns false becauseid > 15is not textually equal toid > 10, so compensation is rejected. This is a functional regression against the PR goal of supporting logically implicative residuals beyond naive set containment.
Critical checkpoints:
- Goal / correctness: Partially achieved. The refactor adds DNF handling and unit tests, but the new implication check is still too syntactic to prove many valid stronger residual predicates.
- Small / focused change: Yes. The change is localized to MV predicate compensation and related tests.
- Concurrency: Not applicable; FE expression rewrite code path, no new locks or thread interactions introduced here.
- Lifecycle / static initialization: No special lifecycle or static-init concerns found.
- Config changes: None.
- Compatibility: No FE-BE protocol or storage compatibility impact found.
- Parallel code paths: I did not find another MV compensation path that also needed the same change.
- Special conditional checks: The new
MAX_DNF_BRANCHESguard is reasonable and fails conservatively. - Test coverage: Improved, but incomplete. The added tests cover exact-branch DNF implication and overflow, but they miss stronger/non-identical branch predicates, which is the failing case above.
- Observability: No additional observability seems necessary for this optimizer-only change.
- Transaction / persistence: Not applicable.
- Data writes / modifications: Not applicable.
- FE-BE variable passing: Not applicable.
- Performance: The overflow guard addresses exponential blow-up, but the current syntactic implication check leaves the feature underpowered rather than unsafe.
- Other issues: None beyond the implication gap above.
Overall opinion: the direction is good, but the current implication rule is not sufficient for the PR’s stated behavior, so I cannot call this fully correct yet.
| if (targetBranches.isEmpty()) { | ||
| return false; | ||
| } | ||
| for (Set<Expression> sourceBranch : sourceBranches) { |
There was a problem hiding this comment.
impliesByDnf() still treats implication as exact set containment inside each DNF branch. That works for A OR (B AND C) vs A OR B, but it still rejects stronger non-identical predicates. For example, if the query residual is id > 15 OR (score = 1 AND id = 5) and the view residual is id > 10 OR (score = 1 AND id = 5), the first query branch does imply the first view branch, but sourceBranch.containsAll(targetBranch) returns false because id > 15 is not textually equal to id > 10. In that case validateCompensationByViewResidual() still bails out, so this doesn't yet deliver the PR goal of handling logically implicative residuals beyond naive set containment. Please add a branch-level implication check for comparable predicates (at least the range predicates you already normalize elsewhere) and cover it with a unit test.
compensation with DNF implication The old residual compensation simply checks whether the query residual set contains all view residual predicates. This breaks on real-world MV rewrites where query and view residuals are structurally different but logically implicative -- e.g. query residual A OR (B AND C) vs view residual A OR B The old code sees two different expression trees and bails out, even though the query side is strictly stronger. This patch introduces DNF-based implication checking (impliesByDnf) to replace the set-containment approach, so compensation succeeds whenever the query candidates logically imply the view residual regardless of structural differences. A hard cap (MAX_DNF_BRANCHES=1024) guards against exponential expansion; when the proof is too expensive the compensation fails conservatively rather than hanging the optimizer. The three separate compensate calls in AbstractMaterializedViewRule are collapsed into a single Predicates.compensatePredicates entry point that encapsulates candidate collection and residual finalization.
b732f01 to
e861646
Compare
|
run buildall |
| .collect(Collectors.toCollection(LinkedHashSet::new)); | ||
| } | ||
|
|
||
| private static boolean validateCompensationByViewResidual( |
There was a problem hiding this comment.
add some comment and example for this method
compensation with DNF implication
The old residual compensation simply checks whether the query residual set contains all view residual predicates. This breaks on real-world MV rewrites where query and view residuals are structurally different but logically implicative -- e.g. query residual
A OR (B AND C)
vs view residual
A OR B
The old code sees two different expression trees and bails out, even though the query side is strictly stronger.
This patch introduces DNF-based implication checking (impliesByDnf) to replace the set-containment approach, so compensation succeeds whenever the query candidates logically imply the view residual regardless of structural differences. A hard cap (MAX_DNF_BRANCHES=1024) guards against exponential expansion; when the proof is too expensive the compensation fails conservatively rather than hanging the optimizer.
The three separate compensate calls in AbstractMaterializedViewRule are collapsed into a single Predicates.compensatePredicates entry point that encapsulates candidate collection and residual finalization.
What problem does this PR solve?
Issue Number: close #xxx
Related PR: #xxx
Problem Summary:
Release note
None
Check List (For Author)
Test
Behavior changed:
Does this need documentation?
Check List (For Reviewer who merge this PR)