Skip to content

[fix](memory) track IO layer read buffers via MemTrackerLimiter using PODArray#62032

Open
sollhui wants to merge 1 commit intoapache:masterfrom
sollhui:fix_io_memory_tracker
Open

[fix](memory) track IO layer read buffers via MemTrackerLimiter using PODArray#62032
sollhui wants to merge 1 commit intoapache:masterfrom
sollhui:fix_io_memory_tracker

Conversation

@sollhui
Copy link
Copy Markdown
Contributor

@sollhui sollhui commented Apr 2, 2026

Problem

Several IO layer read buffers were allocated via std::make_unique<char[]> /
new char[], which bypasses Doris's memory tracking system (MemTrackerLimiter).
These allocations are invisible to the query memory tracker, leading to
under-reported memory usage and potential OOM surprises under concurrent S3/HDFS
scans.

Affected locations:

  • PrefetchBuffer::_buf in buffered_reader — the main S3/HDFS prefetch buffer
  • HttpFileReader::_read_buffer — the 1 MB HTTP read buffer
  • HdfsFileSystem::download_impl — the 1 MB copy buffer for remote→local download

Solution

Replace raw unique_ptr<char[]> allocations with PODArray<char>, which uses
Doris's Allocator<..., check_and_tracking_memory=true> internally. Every
alloc/realloc/free call goes through consume_memory / release_memory
on the thread-local MemTrackerLimiter, so these buffers are now properly
accounted for.

Key behavioral notes:

  • PODArray default constructor allocates no memory, so the lazy-allocation
    optimization in PrefetchBuffer (introduced to reduce peak memory during TVF
    scans over many small S3 files) is fully preserved — the buffer is only
    allocated on the first actual prefetch call.
  • PODArray supports move semantics, so the existing move constructor of
    PrefetchBuffer works unchanged.

Changes

File Change
buffered_reader.h _buf: unique_ptr<char[]>PODArray<char>; remove eager new char[] from constructor
buffered_reader.cpp Add lazy-alloc guard (_buf.empty() / _buf.resize); .get().data() for PrefetchBuffer
http_file_reader.h _read_buffer: unique_ptr<char[]>PODArray<char>
http_file_reader.cpp make_unique<char[]>resize; .get().data()
hdfs_file_system.cpp Local copy buffer: new char[]PODArray<char>

@Thearas
Copy link
Copy Markdown
Contributor

Thearas commented Apr 2, 2026

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@sollhui sollhui force-pushed the fix_io_memory_tracker branch from 994e1b2 to d65f836 Compare April 2, 2026 03:35
@sollhui
Copy link
Copy Markdown
Contributor Author

sollhui commented Apr 2, 2026

run buildall

@sollhui sollhui force-pushed the fix_io_memory_tracker branch from d65f836 to 4e9989b Compare April 2, 2026 04:57
@sollhui
Copy link
Copy Markdown
Contributor Author

sollhui commented Apr 2, 2026

run buildall

@doris-robot
Copy link
Copy Markdown

BE UT Coverage Report

Increment line coverage 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 52.89% (20030/37869)
Line Coverage 36.48% (187977/515226)
Region Coverage 32.71% (145758/445613)
Branch Coverage 33.90% (63883/188458)

@doris-robot
Copy link
Copy Markdown

TPC-H: Total hot run time: 29179 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 4e9989bfc0b65b2c8869e01e78d1cd271ee88910, data reload: false

------ Round 1 ----------------------------------
orders	Doris	NULL	NULL	0	0	0	NULL	0	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	17680	3729	3660	3660
q2	q3	10668	871	617	617
q4	4685	468	364	364
q5	7609	1346	1148	1148
q6	248	170	143	143
q7	937	951	787	787
q8	10196	1402	1342	1342
q9	6262	5342	5269	5269
q10	6339	2027	1765	1765
q11	484	277	282	277
q12	852	690	515	515
q13	18077	2785	2169	2169
q14	289	289	261	261
q15	q16	907	865	804	804
q17	1058	1101	772	772
q18	6467	5613	5528	5528
q19	1615	1278	1081	1081
q20	619	545	420	420
q21	4639	2329	1948	1948
q22	426	375	309	309
Total cold run time: 100057 ms
Total hot run time: 29179 ms

----- Round 2, with runtime_filter_mode=off -----
orders	Doris	NULL	NULL	150000000	42	6422171781	NULL	22778155	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	4031	3901	3881	3881
q2	q3	4573	4721	4123	4123
q4	2000	2069	1352	1352
q5	4921	4931	5171	4931
q6	182	159	133	133
q7	1981	1740	1624	1624
q8	3246	2987	2981	2981
q9	8010	8010	8054	8010
q10	4429	4422	4244	4244
q11	570	402	387	387
q12	658	700	488	488
q13	2419	2855	2158	2158
q14	278	289	256	256
q15	q16	746	781	715	715
q17	1224	1190	1156	1156
q18	7801	6961	6996	6961
q19	1048	1040	1073	1040
q20	2198	2195	1924	1924
q21	5967	5358	5078	5078
q22	524	482	405	405
Total cold run time: 56806 ms
Total hot run time: 51847 ms

@doris-robot
Copy link
Copy Markdown

TPC-DS: Total hot run time: 178890 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 4e9989bfc0b65b2c8869e01e78d1cd271ee88910, data reload: false

query5	4441	652	493	493
query6	338	242	214	214
query7	4412	548	356	356
query8	341	244	233	233
query9	8777	3943	3980	3943
query10	506	424	347	347
query11	6671	5490	5123	5123
query12	191	135	127	127
query13	1391	651	454	454
query14	6102	5178	4805	4805
query14_1	4170	4129	4123	4123
query15	214	208	185	185
query16	1016	471	444	444
query17	1160	777	652	652
query18	2754	510	403	403
query19	263	230	197	197
query20	144	140	131	131
query21	229	146	126	126
query22	13535	13618	13400	13400
query23	17767	16942	16412	16412
query23_1	16504	16579	16562	16562
query24	7532	1747	1377	1377
query24_1	1379	1364	1356	1356
query25	630	544	481	481
query26	1257	335	188	188
query27	2673	620	392	392
query28	4497	1917	1885	1885
query29	1042	708	583	583
query30	303	254	203	203
query31	1127	1057	944	944
query32	92	77	73	73
query33	546	365	317	317
query34	1205	1190	679	679
query35	757	792	669	669
query36	1254	1215	1054	1054
query37	157	142	91	91
query38	3074	3024	2967	2967
query39	928	892	853	853
query39_1	834	817	852	817
query40	250	164	141	141
query41	66	66	62	62
query42	279	282	277	277
query43	311	318	282	282
query44	
query45	208	195	188	188
query46	1144	1218	841	841
query47	2318	2371	2219	2219
query48	403	439	292	292
query49	627	545	423	423
query50	715	278	212	212
query51	4276	4288	4174	4174
query52	280	287	275	275
query53	322	347	273	273
query54	326	286	293	286
query55	101	94	88	88
query56	329	339	322	322
query57	1677	1606	1606	1606
query58	309	275	274	274
query59	2896	2971	2758	2758
query60	341	336	334	334
query61	156	159	158	158
query62	672	617	562	562
query63	320	279	296	279
query64	5299	1414	1078	1078
query65	
query66	1415	480	422	422
query67	24436	24302	24176	24176
query68	
query69	453	343	293	293
query70	1021	991	1001	991
query71	354	324	308	308
query72	2989	2787	2449	2449
query73	787	815	445	445
query74	9817	9729	9629	9629
query75	3531	3372	3049	3049
query76	2339	1111	781	781
query77	406	426	344	344
query78	11377	11362	10758	10758
query79	1523	1048	865	865
query80	808	784	666	666
query81	448	280	236	236
query82	1354	165	121	121
query83	363	292	261	261
query84	256	149	118	118
query85	879	506	454	454
query86	391	336	322	322
query87	3301	3199	3076	3076
query88	3562	2679	2720	2679
query89	509	414	401	401
query90	1966	185	171	171
query91	190	174	146	146
query92	84	73	71	71
query93	884	869	527	527
query94	524	346	310	310
query95	654	377	434	377
query96	1007	753	348	348
query97	2683	2657	2579	2579
query98	241	229	224	224
query99	1065	1078	930	930
Total cold run time: 258720 ms
Total hot run time: 178890 ms

@sollhui sollhui force-pushed the fix_io_memory_tracker branch from 4e9989b to 7747afe Compare April 2, 2026 07:00
@sollhui
Copy link
Copy Markdown
Contributor Author

sollhui commented Apr 2, 2026

run buildall

@doris-robot
Copy link
Copy Markdown

TPC-H: Total hot run time: 29396 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 7747afe1033793a3eda965737a3a4f52c03688b6, data reload: false

------ Round 1 ----------------------------------
orders	Doris	NULL	NULL	0	0	0	NULL	0	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	17597	3735	3848	3735
q2	q3	10680	876	601	601
q4	4678	464	365	365
q5	7447	1327	1141	1141
q6	194	168	136	136
q7	915	956	741	741
q8	9293	1373	1299	1299
q9	5441	5381	5410	5381
q10	6247	2052	1772	1772
q11	468	278	288	278
q12	836	691	511	511
q13	18042	2782	2153	2153
q14	284	285	269	269
q15	q16	864	859	801	801
q17	1080	1059	826	826
q18	6428	5590	5558	5558
q19	1172	1235	1099	1099
q20	608	541	407	407
q21	4862	2544	1999	1999
q22	463	398	324	324
Total cold run time: 97599 ms
Total hot run time: 29396 ms

----- Round 2, with runtime_filter_mode=off -----
orders	Doris	NULL	NULL	150000000	42	6422171781	NULL	22778155	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	4613	4435	4686	4435
q2	q3	4612	4755	4195	4195
q4	1985	2068	1323	1323
q5	4865	5007	5176	5007
q6	202	173	138	138
q7	2034	1772	1623	1623
q8	3289	3096	3057	3057
q9	8581	8321	8346	8321
q10	4449	4472	4199	4199
q11	564	403	383	383
q12	648	710	484	484
q13	2684	3065	2429	2429
q14	287	311	275	275
q15	q16	810	784	683	683
q17	1298	1272	1235	1235
q18	8102	6939	6949	6939
q19	1143	1186	1129	1129
q20	2217	2242	1942	1942
q21	5995	5569	4768	4768
q22	528	474	424	424
Total cold run time: 58906 ms
Total hot run time: 52989 ms

@doris-robot
Copy link
Copy Markdown

TPC-DS: Total hot run time: 179812 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 7747afe1033793a3eda965737a3a4f52c03688b6, data reload: false

query5	4344	630	529	529
query6	345	229	213	213
query7	4337	599	328	328
query8	335	240	223	223
query9	8728	3847	3848	3847
query10	476	395	347	347
query11	6681	5464	5104	5104
query12	194	132	131	131
query13	1338	631	455	455
query14	5654	5152	4732	4732
query14_1	4123	4107	4069	4069
query15	211	206	181	181
query16	979	459	445	445
query17	969	768	663	663
query18	2463	503	380	380
query19	251	237	196	196
query20	142	133	132	132
query21	224	146	121	121
query22	13702	14785	14633	14633
query23	18106	17055	16708	16708
query23_1	16826	16820	16710	16710
query24	7465	1749	1364	1364
query24_1	1353	1355	1396	1355
query25	586	494	428	428
query26	1243	312	177	177
query27	2716	629	368	368
query28	4432	1897	1871	1871
query29	1016	666	574	574
query30	300	235	196	196
query31	1093	1029	938	938
query32	92	72	72	72
query33	534	337	282	282
query34	1179	1154	658	658
query35	748	774	654	654
query36	1254	1230	1087	1087
query37	157	97	80	80
query38	3075	3026	2958	2958
query39	921	895	850	850
query39_1	845	814	844	814
query40	236	153	137	137
query41	62	60	57	57
query42	316	272	278	272
query43	316	312	274	274
query44	
query45	202	196	187	187
query46	1149	1256	782	782
query47	2307	2336	2258	2258
query48	393	414	300	300
query49	639	530	417	417
query50	724	285	219	219
query51	4365	4252	4231	4231
query52	279	280	283	280
query53	322	336	265	265
query54	318	279	280	279
query55	101	96	90	90
query56	324	331	312	312
query57	1722	1745	1716	1716
query58	315	272	268	268
query59	2888	2972	2737	2737
query60	351	339	322	322
query61	161	156	156	156
query62	675	619	558	558
query63	315	276	264	264
query64	5465	1478	1105	1105
query65	
query66	1480	464	384	384
query67	24201	24174	24107	24107
query68	
query69	445	333	298	298
query70	1036	1004	1005	1004
query71	365	325	308	308
query72	3010	2760	2475	2475
query73	815	773	445	445
query74	9800	9692	9514	9514
query75	3570	3353	3002	3002
query76	2299	1109	785	785
query77	389	413	327	327
query78	11360	11366	10731	10731
query79	1550	1034	793	793
query80	832	755	659	659
query81	474	278	239	239
query82	1390	156	126	126
query83	387	294	256	256
query84	312	147	115	115
query85	898	499	472	472
query86	402	323	319	319
query87	3313	3214	3035	3035
query88	3570	2709	2701	2701
query89	477	410	378	378
query90	2009	184	172	172
query91	181	169	145	145
query92	78	72	72	72
query93	901	890	518	518
query94	550	328	283	283
query95	645	449	336	336
query96	969	780	324	324
query97	2679	2686	2685	2685
query98	244	221	235	221
query99	1055	1076	955	955
Total cold run time: 258317 ms
Total hot run time: 179812 ms

@hello-stephen
Copy link
Copy Markdown
Contributor

BE UT Coverage Report

Increment line coverage 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 52.92% (20043/37873)
Line Coverage 36.51% (188149/515280)
Region Coverage 32.77% (146056/445648)
Branch Coverage 33.93% (63960/188494)

@sollhui sollhui force-pushed the fix_io_memory_tracker branch from 7747afe to 35bb6d5 Compare April 2, 2026 08:52
@sollhui
Copy link
Copy Markdown
Contributor Author

sollhui commented Apr 2, 2026

run buildall

@doris-robot
Copy link
Copy Markdown

BE UT Coverage Report

Increment line coverage 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 52.92% (20043/37873)
Line Coverage 36.52% (188169/515281)
Region Coverage 32.81% (146196/445648)
Branch Coverage 33.94% (63968/188494)

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Apr 2, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 2, 2026

PR approved by at least one committer and no changes requested.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 2, 2026

PR approved by anyone and no changes requested.

@doris-robot
Copy link
Copy Markdown

TPC-H: Total hot run time: 29162 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 35bb6d5b63b3b1f21b5d1a5cf6c71523caadba5e, data reload: false

------ Round 1 ----------------------------------
orders	Doris	NULL	NULL	0	0	0	NULL	0	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	17596	3659	3636	3636
q2	q3	10681	847	618	618
q4	4672	458	365	365
q5	7453	1352	1136	1136
q6	181	161	135	135
q7	909	958	782	782
q8	9313	1433	1267	1267
q9	5549	5348	5282	5282
q10	6307	2046	1748	1748
q11	473	278	277	277
q12	840	683	521	521
q13	18015	2766	2156	2156
q14	283	286	259	259
q15	q16	885	874	801	801
q17	996	1015	795	795
q18	6453	5664	5584	5584
q19	1299	1219	1040	1040
q20	608	540	422	422
q21	4941	2438	2012	2012
q22	445	386	326	326
Total cold run time: 97899 ms
Total hot run time: 29162 ms

----- Round 2, with runtime_filter_mode=off -----
orders	Doris	NULL	NULL	150000000	42	6422171781	NULL	22778155	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	4610	4349	4349	4349
q2	q3	4609	4720	4122	4122
q4	2000	2051	1388	1388
q5	4877	4996	5316	4996
q6	196	164	130	130
q7	2060	1810	1646	1646
q8	3280	3045	3130	3045
q9	8245	8538	8248	8248
q10	4478	4466	4239	4239
q11	637	404	376	376
q12	653	775	592	592
q13	2654	3053	2467	2467
q14	301	306	295	295
q15	q16	738	761	683	683
q17	1246	1270	1184	1184
q18	7892	6882	7021	6882
q19	1133	1103	1076	1076
q20	2225	2194	1941	1941
q21	5897	5377	4750	4750
q22	531	486	396	396
Total cold run time: 58262 ms
Total hot run time: 52805 ms

@doris-robot
Copy link
Copy Markdown

TPC-DS: Total hot run time: 180691 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 35bb6d5b63b3b1f21b5d1a5cf6c71523caadba5e, data reload: false

query5	4334	655	516	516
query6	333	232	207	207
query7	4229	615	328	328
query8	338	248	222	222
query9	8738	3872	3913	3872
query10	476	399	355	355
query11	6668	5508	5138	5138
query12	187	133	134	133
query13	1318	616	441	441
query14	5648	5200	4756	4756
query14_1	4163	4135	4121	4121
query15	209	212	189	189
query16	1044	455	451	451
query17	1156	786	661	661
query18	2570	513	392	392
query19	250	232	210	210
query20	142	137	133	133
query21	224	146	124	124
query22	13958	14742	14898	14742
query23	18261	17320	16778	16778
query23_1	17005	16839	16860	16839
query24	7755	1737	1372	1372
query24_1	1353	1342	1364	1342
query25	587	497	428	428
query26	1258	319	175	175
query27	2699	659	363	363
query28	4441	1871	1886	1871
query29	983	656	560	560
query30	298	238	195	195
query31	1084	1057	944	944
query32	82	74	68	68
query33	542	353	290	290
query34	1184	1148	654	654
query35	747	789	690	690
query36	1283	1255	1087	1087
query37	150	100	79	79
query38	3077	3059	3004	3004
query39	904	908	849	849
query39_1	848	847	830	830
query40	234	157	137	137
query41	61	58	60	58
query42	270	271	270	270
query43	318	319	274	274
query44	
query45	208	194	186	186
query46	1122	1284	805	805
query47	2355	2324	2226	2226
query48	421	393	298	298
query49	642	522	439	439
query50	711	274	209	209
query51	4341	4294	4235	4235
query52	281	283	271	271
query53	322	343	269	269
query54	327	277	281	277
query55	100	96	89	89
query56	324	314	325	314
query57	1718	1788	1642	1642
query58	299	277	273	273
query59	2911	2979	2733	2733
query60	337	341	321	321
query61	157	155	152	152
query62	666	629	608	608
query63	313	274	266	266
query64	5332	1445	1111	1111
query65	
query66	1438	471	368	368
query67	24435	24382	24229	24229
query68	
query69	448	344	323	323
query70	1036	970	994	970
query71	363	325	315	315
query72	2940	2749	2495	2495
query73	816	747	420	420
query74	9833	9717	9546	9546
query75	3548	3348	3018	3018
query76	2323	1109	775	775
query77	414	413	345	345
query78	11289	11340	10739	10739
query79	1476	1141	802	802
query80	989	786	723	723
query81	501	279	244	244
query82	1346	156	127	127
query83	402	292	268	268
query84	265	154	124	124
query85	1022	589	541	541
query86	431	341	317	317
query87	3233	3182	3104	3104
query88	3529	2716	2699	2699
query89	490	405	375	375
query90	1871	171	177	171
query91	178	177	146	146
query92	81	73	72	72
query93	895	905	509	509
query94	616	327	302	302
query95	657	447	332	332
query96	1079	778	325	325
query97	2704	2687	2569	2569
query98	261	226	223	223
query99	1095	1079	945	945
Total cold run time: 259622 ms
Total hot run time: 180691 ms

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants