Skip to content

[Enhancement](mv-rewrite) Replace naive set-containment residual#61345

Open
foxtail463 wants to merge 2 commits intoapache:masterfrom
foxtail463:predicates_compassion_enhancement
Open

[Enhancement](mv-rewrite) Replace naive set-containment residual#61345
foxtail463 wants to merge 2 commits intoapache:masterfrom
foxtail463:predicates_compassion_enhancement

Conversation

@foxtail463
Copy link
Copy Markdown
Contributor

@foxtail463 foxtail463 commented Mar 15, 2026

compensation with DNF implication

The old residual compensation simply checks whether the query residual set contains all view residual predicates. This breaks on real-world MV rewrites where query and view residuals are structurally different but logically implicative -- e.g. query residual
A OR (B AND C)
vs view residual
A OR B
The old code sees two different expression trees and bails out, even though the query side is strictly stronger.

This patch introduces DNF-based implication checking (impliesByDnf) to replace the set-containment approach, so compensation succeeds whenever the query candidates logically imply the view residual regardless of structural differences. A hard cap (MAX_DNF_BRANCHES=1024) guards against exponential expansion; when the proof is too expensive the compensation fails conservatively rather than hanging the optimizer.

The three separate compensate calls in AbstractMaterializedViewRule are collapsed into a single Predicates.compensatePredicates entry point that encapsulates candidate collection and residual finalization.

What problem does this PR solve?

Issue Number: close #xxx

Related PR: #xxx

Problem Summary:

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@hello-stephen
Copy link
Copy Markdown
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@seawinde
Copy link
Copy Markdown
Member

run buildall

@doris-robot
Copy link
Copy Markdown

TPC-H: Total hot run time: 26490 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit b732f0171ba49e0931e1a273568a4243a873fafb, data reload: false

------ Round 1 ----------------------------------
orders	Doris	NULL	NULL	0	0	0	NULL	0	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	17610	4409	4419	4409
q2	q3	10636	778	514	514
q4	4683	373	245	245
q5	7567	1209	1019	1019
q6	181	175	146	146
q7	770	854	673	673
q8	9297	1432	1300	1300
q9	4899	4709	4628	4628
q10	6248	1922	1658	1658
q11	459	254	235	235
q12	705	592	465	465
q13	18046	2711	1929	1929
q14	227	229	212	212
q15	q16	722	747	674	674
q17	726	856	418	418
q18	5951	5472	5342	5342
q19	1274	969	588	588
q20	542	507	376	376
q21	4655	1815	1368	1368
q22	338	291	385	291
Total cold run time: 95536 ms
Total hot run time: 26490 ms

----- Round 2, with runtime_filter_mode=off -----
orders	Doris	NULL	NULL	150000000	42	6422171781	NULL	22778155	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	4867	4665	4615	4615
q2	q3	3840	4348	3842	3842
q4	885	1194	765	765
q5	4094	4379	4398	4379
q6	202	180	148	148
q7	1776	1658	1549	1549
q8	2499	2727	2587	2587
q9	7751	7491	7350	7350
q10	3778	4009	3579	3579
q11	507	437	429	429
q12	510	617	451	451
q13	2455	2854	2075	2075
q14	286	292	305	292
q15	q16	890	783	725	725
q17	1182	1418	1376	1376
q18	7209	7021	6708	6708
q19	883	904	892	892
q20	2113	2206	2008	2008
q21	3942	3475	3465	3465
q22	490	455	392	392
Total cold run time: 50159 ms
Total hot run time: 47627 ms

@doris-robot
Copy link
Copy Markdown

TPC-DS: Total hot run time: 168937 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit b732f0171ba49e0931e1a273568a4243a873fafb, data reload: false

query5	4323	630	497	497
query6	348	240	218	218
query7	4215	461	281	281
query8	352	249	236	236
query9	8699	2732	2760	2732
query10	545	382	334	334
query11	6935	5116	4853	4853
query12	186	135	125	125
query13	1292	470	344	344
query14	5766	3772	3479	3479
query14_1	2905	2798	2747	2747
query15	204	191	175	175
query16	958	451	448	448
query17	860	702	606	606
query18	2432	440	338	338
query19	209	201	181	181
query20	137	129	128	128
query21	210	134	108	108
query22	13242	13993	14784	13993
query23	16957	16337	15986	15986
query23_1	15916	15708	15745	15708
query24	7225	1613	1241	1241
query24_1	1252	1239	1230	1230
query25	583	498	441	441
query26	1237	272	153	153
query27	2782	496	296	296
query28	4536	1860	1858	1858
query29	890	582	515	515
query30	298	226	190	190
query31	1012	978	867	867
query32	84	77	73	73
query33	529	354	299	299
query34	930	886	519	519
query35	629	689	595	595
query36	1086	1165	1014	1014
query37	143	106	92	92
query38	2885	2922	2842	2842
query39	855	845	805	805
query39_1	806	800	797	797
query40	236	160	141	141
query41	69	64	63	63
query42	270	253	260	253
query43	239	259	216	216
query44	
query45	204	189	187	187
query46	882	999	607	607
query47	2144	2170	2105	2105
query48	318	326	240	240
query49	650	473	389	389
query50	712	283	212	212
query51	4084	4064	4107	4064
query52	264	262	262	262
query53	286	333	286	286
query54	297	280	267	267
query55	89	84	81	81
query56	310	308	311	308
query57	1928	1753	1703	1703
query58	279	274	274	274
query59	2792	2934	2773	2773
query60	350	343	328	328
query61	156	150	174	150
query62	640	589	533	533
query63	302	283	278	278
query64	5115	1271	982	982
query65	
query66	1461	460	354	354
query67	24318	24283	24158	24158
query68	
query69	403	310	295	295
query70	989	1002	947	947
query71	346	353	296	296
query72	2785	2721	2420	2420
query73	544	551	311	311
query74	9595	9539	9385	9385
query75	2873	2794	2484	2484
query76	2297	1029	673	673
query77	361	376	326	326
query78	11041	11238	10599	10599
query79	1123	780	580	580
query80	717	626	548	548
query81	492	260	226	226
query82	1328	160	118	118
query83	372	261	247	247
query84	298	117	95	95
query85	849	503	451	451
query86	386	304	295	295
query87	3177	3104	3010	3010
query88	3519	2666	2650	2650
query89	433	368	343	343
query90	1971	182	173	173
query91	170	163	134	134
query92	81	79	68	68
query93	928	834	499	499
query94	449	329	290	290
query95	604	397	322	322
query96	655	532	237	237
query97	2493	2602	2402	2402
query98	233	219	242	219
query99	1017	1001	901	901
Total cold run time: 249242 ms
Total hot run time: 168937 ms

@hello-stephen
Copy link
Copy Markdown
Contributor

FE UT Coverage Report

Increment line coverage 85.31% (122/143) 🎉
Increment coverage report
Complete coverage report

@seawinde
Copy link
Copy Markdown
Member

seawinde commented Apr 3, 2026

/review

Copy link
Copy Markdown
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One correctness issue found.

  1. fe/fe-core/src/main/java/org/apache/doris/nereids/rules/exploration/mv/Predicates.java: impliesByDnf() still reduces implication to exact predicate-set containment inside each DNF branch. That means the new logic does not handle stronger-but-non-identical predicates, e.g. query residual id > 15 OR (score = 1 AND id = 5) against view residual id > 10 OR (score = 1 AND id = 5). The first query branch should imply the first view branch, but sourceBranch.containsAll(targetBranch) returns false because id > 15 is not textually equal to id > 10, so compensation is rejected. This is a functional regression against the PR goal of supporting logically implicative residuals beyond naive set containment.

Critical checkpoints:

  • Goal / correctness: Partially achieved. The refactor adds DNF handling and unit tests, but the new implication check is still too syntactic to prove many valid stronger residual predicates.
  • Small / focused change: Yes. The change is localized to MV predicate compensation and related tests.
  • Concurrency: Not applicable; FE expression rewrite code path, no new locks or thread interactions introduced here.
  • Lifecycle / static initialization: No special lifecycle or static-init concerns found.
  • Config changes: None.
  • Compatibility: No FE-BE protocol or storage compatibility impact found.
  • Parallel code paths: I did not find another MV compensation path that also needed the same change.
  • Special conditional checks: The new MAX_DNF_BRANCHES guard is reasonable and fails conservatively.
  • Test coverage: Improved, but incomplete. The added tests cover exact-branch DNF implication and overflow, but they miss stronger/non-identical branch predicates, which is the failing case above.
  • Observability: No additional observability seems necessary for this optimizer-only change.
  • Transaction / persistence: Not applicable.
  • Data writes / modifications: Not applicable.
  • FE-BE variable passing: Not applicable.
  • Performance: The overflow guard addresses exponential blow-up, but the current syntactic implication check leaves the feature underpowered rather than unsafe.
  • Other issues: None beyond the implication gap above.

Overall opinion: the direction is good, but the current implication rule is not sufficient for the PR’s stated behavior, so I cannot call this fully correct yet.

if (targetBranches.isEmpty()) {
return false;
}
for (Set<Expression> sourceBranch : sourceBranches) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

impliesByDnf() still treats implication as exact set containment inside each DNF branch. That works for A OR (B AND C) vs A OR B, but it still rejects stronger non-identical predicates. For example, if the query residual is id > 15 OR (score = 1 AND id = 5) and the view residual is id > 10 OR (score = 1 AND id = 5), the first query branch does imply the first view branch, but sourceBranch.containsAll(targetBranch) returns false because id > 15 is not textually equal to id > 10. In that case validateCompensationByViewResidual() still bails out, so this doesn't yet deliver the PR goal of handling logically implicative residuals beyond naive set containment. Please add a branch-level implication check for comparable predicates (at least the range predicates you already normalize elsewhere) and cover it with a unit test.

yangtao555 added 2 commits April 5, 2026 09:29
compensation with DNF implication

The old residual compensation simply checks whether the query residual
set contains all view residual predicates. This breaks on real-world MV
rewrites where query and view residuals are structurally different but
logically implicative -- e.g. query residual
  A OR (B AND C)
vs view residual
  A OR B
The old code sees two different expression trees and bails out, even
though the query side is strictly stronger.

This patch introduces DNF-based implication checking (impliesByDnf) to
replace the set-containment approach, so compensation succeeds whenever
the query candidates logically imply the view residual regardless of
structural differences. A hard cap (MAX_DNF_BRANCHES=1024) guards
against exponential expansion; when the proof is too expensive the
compensation fails conservatively rather than hanging the optimizer.

The three separate compensate calls in AbstractMaterializedViewRule are
collapsed into a single Predicates.compensatePredicates entry point
that encapsulates candidate collection and residual finalization.
@foxtail463 foxtail463 force-pushed the predicates_compassion_enhancement branch from b732f01 to e861646 Compare April 6, 2026 03:24
@seawinde
Copy link
Copy Markdown
Member

seawinde commented Apr 7, 2026

run buildall

.collect(Collectors.toCollection(LinkedHashSet::new));
}

private static boolean validateCompensationByViewResidual(
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add some comment and example for this method

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants