Skip to content

[env](compiler) Reduce compile time of aggregate_function_reader_replace.cpp#62047

Open
Mryange wants to merge 2 commits intoapache:masterfrom
Mryange:env-compiler-replace-agg
Open

[env](compiler) Reduce compile time of aggregate_function_reader_replace.cpp#62047
Mryange wants to merge 2 commits intoapache:masterfrom
Mryange:env-compiler-replace-agg

Conversation

@Mryange
Copy link
Copy Markdown
Contributor

@Mryange Mryange commented Apr 2, 2026

What problem does this PR solve?

aggregate_function_reader_replace.cpp took 86.91s to compile because it included aggregate_function_reader_first_last.h, which pulled in heavy template machinery: a 28-case column-type switch × 4 bool combinations × 4 factory functions = ~448 make_shared<ReaderFunctionData<...>> instantiations in a single TU.

How is it solved?

  1. Decouple from aggregate_function_reader_first_last.h: The reader/load path now has a self-contained, lightweight implementation directly in the .cpp file. The shared header is no longer included.

  2. Introduce PointerStore<ArgIsNullable> / CopyStore<ArgIsNullable>: Two small storage classes with a uniform interface (is_null, set_value<SkipNull>, insert_into, reset). Reader path uses zero-copy PointerStore; load path uses CopyStore with Field deep-copy.

  3. Use 3 bool template params <IsFirst, SkipNull, ArgIsNullable>: IsCopy is derived as !IsFirst. ArgIsNullable is resolved at registration time (nullable vs non-nullable factory map), eliminating runtime branches. result_is_nullable stays as a runtime bool (called once per group).

  4. All column operations use IColumn virtual dispatch: No column-type-specific template instantiations. Performance impact is negligible — Field operations and the add() virtual call itself dominate.

  5. Clean up aggregate_function_reader_first_last.h: Removed ReaderFunctionFirstData, ReaderFunctionLastData, ReaderFunctionFirstNonNullData, ReaderFunctionLastNonNullData, ReaderFunctionData, create_function_single_value, the 28-case switch, and the CREATE_READER_FUNCTION_WITH_NAME_AND_DATA macro — all now unused. Also removed unnecessary includes (column_array.h, column_map.h, etc.).

Compile time

Stage Time
Before 86.91s
After 17.0s (↓80%)

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@hello-stephen
Copy link
Copy Markdown
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@Mryange
Copy link
Copy Markdown
Contributor Author

Mryange commented Apr 2, 2026

run buildall

@doris-robot
Copy link
Copy Markdown

TPC-H: Total hot run time: 29394 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 99cd41e65d2f8a932e4ee33db6c5808a5f1a8c36, data reload: false

------ Round 1 ----------------------------------
orders	Doris	NULL	NULL	0	0	0	NULL	0	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	17609	3712	3731	3712
q2	q3	10672	869	600	600
q4	4678	474	367	367
q5	7449	1338	1146	1146
q6	187	167	142	142
q7	914	932	776	776
q8	9308	1412	1290	1290
q9	5557	5339	5353	5339
q10	6238	2021	1794	1794
q11	474	278	276	276
q12	846	688	518	518
q13	18017	2763	2170	2170
q14	287	287	271	271
q15	q16	863	862	796	796
q17	1077	1174	773	773
q18	6415	5702	5616	5616
q19	1167	1253	959	959
q20	589	547	412	412
q21	5006	2460	2085	2085
q22	487	402	352	352
Total cold run time: 97840 ms
Total hot run time: 29394 ms

----- Round 2, with runtime_filter_mode=off -----
orders	Doris	NULL	NULL	150000000	42	6422171781	NULL	22778155	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	4608	4465	4612	4465
q2	q3	4633	4792	4206	4206
q4	2045	2106	1361	1361
q5	4885	4952	5191	4952
q6	196	168	136	136
q7	1993	1768	1577	1577
q8	3337	3063	3153	3063
q9	8388	8228	8536	8228
q10	4462	4520	4212	4212
q11	620	422	411	411
q12	651	708	480	480
q13	2651	3177	2325	2325
q14	290	302	259	259
q15	q16	772	762	696	696
q17	1295	1284	1211	1211
q18	8036	6952	6992	6952
q19	1135	1124	1124	1124
q20	2215	2201	1978	1978
q21	6275	5407	4790	4790
q22	531	505	420	420
Total cold run time: 59018 ms
Total hot run time: 52846 ms

@doris-robot
Copy link
Copy Markdown

TPC-DS: Total hot run time: 179511 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 99cd41e65d2f8a932e4ee33db6c5808a5f1a8c36, data reload: false

query5	4343	655	503	503
query6	343	256	206	206
query7	4233	565	336	336
query8	326	243	222	222
query9	8750	3881	3878	3878
query10	489	379	341	341
query11	6670	5469	5140	5140
query12	189	134	128	128
query13	1284	639	449	449
query14	5679	5212	4806	4806
query14_1	4140	4138	4133	4133
query15	208	195	178	178
query16	972	477	422	422
query17	936	736	614	614
query18	2509	476	357	357
query19	237	221	186	186
query20	142	133	131	131
query21	230	140	126	126
query22	13627	13592	13314	13314
query23	17860	17476	16602	16602
query23_1	16756	16975	16817	16817
query24	7688	1841	1443	1443
query24_1	1415	1451	1411	1411
query25	619	521	442	442
query26	1426	338	199	199
query27	2844	650	376	376
query28	4545	1903	1892	1892
query29	922	658	532	532
query30	295	229	197	197
query31	1100	1057	946	946
query32	81	72	72	72
query33	552	337	289	289
query34	1178	1157	669	669
query35	724	766	675	675
query36	1232	1213	1060	1060
query37	156	103	82	82
query38	3095	3029	3008	3008
query39	904	902	850	850
query39_1	830	832	835	832
query40	239	155	157	155
query41	63	59	58	58
query42	275	271	279	271
query43	308	324	278	278
query44	
query45	203	197	184	184
query46	1119	1291	801	801
query47	2340	2322	2241	2241
query48	393	402	295	295
query49	643	529	425	425
query50	707	280	220	220
query51	4378	4264	4243	4243
query52	275	287	269	269
query53	330	347	282	282
query54	331	285	264	264
query55	102	95	87	87
query56	315	336	318	318
query57	1701	1713	1684	1684
query58	301	272	282	272
query59	2917	2989	2728	2728
query60	341	355	328	328
query61	163	166	156	156
query62	703	616	557	557
query63	329	278	303	278
query64	5237	1448	1232	1232
query65	
query66	1533	497	403	403
query67	24284	24561	24282	24282
query68	
query69	455	353	320	320
query70	1030	999	998	998
query71	364	337	323	323
query72	3163	2920	2655	2655
query73	820	761	435	435
query74	9827	9733	9528	9528
query75	3541	3380	2996	2996
query76	2304	1110	742	742
query77	385	437	349	349
query78	11285	11336	10762	10762
query79	1529	1098	833	833
query80	841	773	659	659
query81	464	289	240	240
query82	1374	158	125	125
query83	370	292	260	260
query84	304	147	117	117
query85	883	519	452	452
query86	391	361	351	351
query87	3306	3201	3123	3123
query88	3576	2690	2695	2690
query89	480	410	379	379
query90	1996	183	174	174
query91	177	172	141	141
query92	77	73	69	69
query93	891	880	511	511
query94	539	304	318	304
query95	680	371	331	331
query96	1005	764	346	346
query97	2689	2686	2558	2558
query98	240	230	230	230
query99	1058	1067	971	971
Total cold run time: 258628 ms
Total hot run time: 179511 ms

@hello-stephen
Copy link
Copy Markdown
Contributor

BE UT Coverage Report

Increment line coverage 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 52.93% (20046/37870)
Line Coverage 36.52% (188111/515158)
Region Coverage 32.79% (146111/445534)
Branch Coverage 33.92% (63921/188430)

@Mryange
Copy link
Copy Markdown
Contributor Author

Mryange commented Apr 2, 2026

run buildall

@doris-robot
Copy link
Copy Markdown

TPC-H: Total hot run time: 29365 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 014d8abd054beceaa53e6e056b4ac459f8f780de, data reload: false

------ Round 1 ----------------------------------
orders	Doris	NULL	NULL	0	0	0	NULL	0	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	17668	3713	3721	3713
q2	q3	10682	848	617	617
q4	4674	459	374	374
q5	7461	1335	1143	1143
q6	189	169	137	137
q7	949	931	761	761
q8	9297	1451	1273	1273
q9	5465	5364	5299	5299
q10	6258	2028	1760	1760
q11	478	276	279	276
q12	834	715	521	521
q13	18038	2789	2157	2157
q14	280	292	259	259
q15	q16	852	853	792	792
q17	973	1046	849	849
q18	6356	5617	5514	5514
q19	1161	1266	1093	1093
q20	583	533	406	406
q21	4437	2480	2081	2081
q22	488	403	340	340
Total cold run time: 97123 ms
Total hot run time: 29365 ms

----- Round 2, with runtime_filter_mode=off -----
orders	Doris	NULL	NULL	150000000	42	6422171781	NULL	22778155	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	4610	4495	4623	4495
q2	q3	4586	4747	4131	4131
q4	2042	2057	1336	1336
q5	4895	5071	5255	5071
q6	206	176	139	139
q7	1980	1780	1625	1625
q8	3435	3256	3039	3039
q9	8296	8292	8209	8209
q10	4456	4459	4235	4235
q11	587	436	372	372
q12	662	709	483	483
q13	2658	3151	2455	2455
q14	300	299	278	278
q15	q16	768	764	669	669
q17	1295	1234	1198	1198
q18	8048	7144	6993	6993
q19	1090	1143	1119	1119
q20	2210	2187	1939	1939
q21	5937	5302	4806	4806
q22	545	502	414	414
Total cold run time: 58606 ms
Total hot run time: 53006 ms

@doris-robot
Copy link
Copy Markdown

TPC-DS: Total hot run time: 180110 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 014d8abd054beceaa53e6e056b4ac459f8f780de, data reload: false

query5	4327	628	498	498
query6	332	229	211	211
query7	4319	582	329	329
query8	338	243	224	224
query9	8739	3864	3898	3864
query10	482	390	333	333
query11	6661	5470	5121	5121
query12	186	128	125	125
query13	1256	589	426	426
query14	5588	5132	4725	4725
query14_1	4069	4066	4077	4066
query15	208	199	181	181
query16	976	446	421	421
query17	947	798	658	658
query18	2491	502	380	380
query19	251	236	194	194
query20	140	132	139	132
query21	224	146	121	121
query22	13682	14903	14742	14742
query23	18052	17154	16863	16863
query23_1	16942	16644	16697	16644
query24	7419	1743	1371	1371
query24_1	1360	1346	1365	1346
query25	622	517	465	465
query26	1254	325	181	181
query27	2783	633	399	399
query28	4479	1919	1933	1919
query29	1009	685	578	578
query30	308	234	197	197
query31	1082	1048	946	946
query32	89	76	75	75
query33	549	368	317	317
query34	1193	1154	681	681
query35	736	774	661	661
query36	1230	1235	1088	1088
query37	156	101	85	85
query38	3120	3044	2957	2957
query39	910	890	860	860
query39_1	838	846	848	846
query40	233	159	138	138
query41	62	59	60	59
query42	271	271	268	268
query43	328	315	272	272
query44	
query45	203	192	184	184
query46	1117	1216	789	789
query47	2343	2335	2242	2242
query48	404	392	291	291
query49	633	527	441	441
query50	682	277	215	215
query51	4316	4276	4195	4195
query52	280	279	278	278
query53	320	342	277	277
query54	319	317	261	261
query55	99	92	91	91
query56	320	321	326	321
query57	1757	1755	1522	1522
query58	302	270	277	270
query59	2917	2972	2709	2709
query60	353	335	333	333
query61	156	147	152	147
query62	690	625	570	570
query63	310	269	268	268
query64	5380	1451	1101	1101
query65	
query66	1474	471	370	370
query67	24281	24268	24259	24259
query68	
query69	457	343	303	303
query70	981	993	962	962
query71	360	321	310	310
query72	2860	2724	2414	2414
query73	793	787	471	471
query74	9819	9771	9558	9558
query75	3568	3362	2989	2989
query76	2304	1123	743	743
query77	398	395	352	352
query78	11296	11332	10765	10765
query79	1494	1098	808	808
query80	826	779	662	662
query81	466	287	234	234
query82	1370	157	120	120
query83	373	292	258	258
query84	309	147	115	115
query85	878	504	457	457
query86	380	376	311	311
query87	3281	3212	3100	3100
query88	3535	2720	2676	2676
query89	484	408	376	376
query90	1974	179	172	172
query91	186	164	149	149
query92	78	72	73	72
query93	892	882	503	503
query94	547	343	299	299
query95	648	370	431	370
query96	1001	789	328	328
query97	2666	2679	2580	2580
query98	235	224	227	224
query99	1068	1064	979	979
Total cold run time: 257597 ms
Total hot run time: 180110 ms

@hello-stephen
Copy link
Copy Markdown
Contributor

BE UT Coverage Report

Increment line coverage 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 52.98% (20068/37881)
Line Coverage 36.56% (188420/515369)
Region Coverage 32.87% (146513/445740)
Branch Coverage 33.98% (64067/188538)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants