Skip to content

Conversation

@kaka11chen
Copy link
Contributor

@kaka11chen kaka11chen commented Oct 21, 2025

What problem does this PR solve?

Problem Summary:

Release note

Optimize Complex Type Column Reading with Column Pruning

Description

This PR implements column pruning for complex types (Struct, Array, Map) to optimize read performance. Previously, Doris would read entire complex type fields before processing, which was simple to implement but inefficient when only specific sub-columns were needed.

Key changes:

  • FE (Frontend): Added column access path calculation and type pruning

    • Collects and analyzes access paths for complex type fields
    • Performs type pruning based on access paths
    • Implements projection pushdown for complex types
  • BE (Backend): Added selective column reading

    • Uses columnAccessPath array from FE to identify required sub-columns
    • Implements selective reading to skip unnecessary sub-columns

Why

Performance Improvement: When a struct contains hundreds or thousands of columns but the query only accesses a few sub-columns, this optimization can significantly reduce I/O and improve query performance. For example, with struct<int a, int b> s, when only s.a is referenced, we can avoid reading s.b entirely.

Technical Benefits: Reduces unnecessary data scanning and decoding overhead for complex types, aligning with Doris's continuous performance optimization goals .

TODO & Future Optimizations

  • Lazy Materialization for Complex Type Sub-columns: Defer materialization of unused sub-columns
  • Predicate Pushdown for Complex Type Sub-columns: Push predicates to storage layer for better filtering
  • Parquet RL/DL Optimization: Read only repetition levels and definition levels without data in appropriate scenarios
  • Array Size Optimization: Read only offset and null values for array_size() operations
  • Null Check Optimization: Read only offset and null values for != null checks

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@hello-stephen
Copy link
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@kaka11chen kaka11chen force-pushed the nested_column_prune_external_table_no_late_mat branch from 1c99dc6 to d47ffd5 Compare October 21, 2025 13:27
@kaka11chen
Copy link
Contributor Author

run buildall

@kaka11chen kaka11chen force-pushed the nested_column_prune_external_table_no_late_mat branch from 5642997 to 3fc502e Compare October 21, 2025 13:45
@kaka11chen
Copy link
Contributor Author

run buildall

@hello-stephen
Copy link
Contributor

Cloud UT Coverage Report

Increment line coverage 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 80.77% (1647/2039)
Line Coverage 67.04% (29059/43346)
Region Coverage 67.31% (14371/21352)
Branch Coverage 57.66% (7638/13246)

@hello-stephen
Copy link
Contributor

FE UT Coverage Report

Increment line coverage 52.00% (156/300) 🎉
Increment coverage report
Complete coverage report

@hello-stephen
Copy link
Contributor

FE Regression Coverage Report

Increment line coverage 88.67% (266/300) 🎉
Increment coverage report
Complete coverage report

@kaka11chen kaka11chen force-pushed the nested_column_prune_external_table_no_late_mat branch 2 times, most recently from 3627661 to 3647221 Compare October 22, 2025 02:58
@kaka11chen
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

Cloud UT Coverage Report

Increment line coverage 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 80.77% (1647/2039)
Line Coverage 67.03% (29054/43346)
Region Coverage 67.32% (14374/21352)
Branch Coverage 57.69% (7641/13246)

@doris-robot
Copy link

ClickBench: Total hot run time: 29.15 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 3647221f1bd5d8e52cb32f746bfb6833b2a6494f, data reload: false

query1	0.05	0.05	0.05
query2	0.13	0.07	0.07
query3	0.31	0.07	0.07
query4	1.60	0.09	0.09
query5	0.27	0.26	0.25
query6	1.17	0.67	0.65
query7	0.03	0.02	0.03
query8	0.07	0.06	0.06
query9	0.66	0.54	0.52
query10	0.59	0.59	0.59
query11	0.27	0.14	0.14
query12	0.27	0.15	0.14
query13	0.66	0.62	0.63
query14	1.07	1.06	1.04
query15	0.96	0.90	0.88
query16	0.39	0.40	0.39
query17	1.07	1.05	1.07
query18	0.24	0.22	0.24
query19	1.99	1.89	1.81
query20	0.01	0.01	0.02
query21	15.41	0.29	0.24
query22	5.00	0.09	0.10
query23	15.38	0.38	0.23
query24	2.92	0.48	0.30
query25	0.10	0.09	0.09
query26	0.19	0.18	0.17
query27	0.09	0.09	0.08
query28	3.66	1.26	1.05
query29	12.62	4.06	3.34
query30	0.34	0.12	0.10
query31	2.84	0.64	0.44
query32	3.24	0.63	0.55
query33	3.16	3.10	3.13
query34	17.03	5.50	4.77
query35	4.89	4.87	4.89
query36	0.67	0.53	0.52
query37	0.22	0.09	0.09
query38	0.19	0.06	0.06
query39	0.06	0.05	0.05
query40	0.21	0.18	0.19
query41	0.11	0.07	0.06
query42	0.06	0.04	0.04
query43	0.06	0.06	0.06
Total cold run time: 100.26 s
Total hot run time: 29.15 s

@hello-stephen
Copy link
Contributor

FE UT Coverage Report

Increment line coverage 52.00% (156/300) 🎉
Increment coverage report
Complete coverage report

@kaka11chen kaka11chen force-pushed the nested_column_prune_external_table_no_late_mat branch 3 times, most recently from 34a95f7 to 087f4e0 Compare October 23, 2025 13:22
@kaka11chen
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

Cloud UT Coverage Report

Increment line coverage 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 80.62% (1647/2043)
Line Coverage 66.94% (29038/43376)
Region Coverage 67.26% (14372/21368)
Branch Coverage 57.62% (7637/13254)

@doris-robot
Copy link

ClickBench: Total hot run time: 28.24 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 087f4e08af2665a1082f04139fe559371526a6c1, data reload: false

query1	0.06	0.05	0.05
query2	0.10	0.06	0.05
query3	0.25	0.09	0.08
query4	1.61	0.12	0.11
query5	0.28	0.28	0.25
query6	1.18	0.67	0.66
query7	0.04	0.03	0.02
query8	0.06	0.05	0.04
query9	0.63	0.54	0.52
query10	0.59	0.59	0.58
query11	0.17	0.11	0.12
query12	0.16	0.12	0.13
query13	0.63	0.61	0.61
query14	1.04	1.02	1.02
query15	0.86	0.86	0.85
query16	0.39	0.41	0.40
query17	1.05	1.06	1.05
query18	0.22	0.21	0.20
query19	1.87	1.84	1.81
query20	0.01	0.01	0.02
query21	15.46	0.20	0.13
query22	5.04	0.07	0.05
query23	15.68	0.27	0.10
query24	1.63	1.13	0.88
query25	0.08	0.08	0.09
query26	0.15	0.14	0.13
query27	0.07	0.07	0.06
query28	5.20	1.17	0.94
query29	12.61	4.03	3.29
query30	0.29	0.15	0.14
query31	2.83	0.58	0.39
query32	3.24	0.56	0.48
query33	3.12	3.12	3.04
query34	15.67	5.16	4.53
query35	4.54	4.59	4.62
query36	0.67	0.51	0.50
query37	0.11	0.07	0.07
query38	0.07	0.05	0.04
query39	0.04	0.04	0.03
query40	0.18	0.16	0.14
query41	0.09	0.03	0.04
query42	0.04	0.04	0.03
query43	0.04	0.04	0.03
Total cold run time: 98.05 s
Total hot run time: 28.24 s

@hello-stephen
Copy link
Contributor

FE Regression Coverage Report

Increment line coverage 66.60% (674/1012) 🎉
Increment coverage report
Complete coverage report

@kaka11chen kaka11chen force-pushed the nested_column_prune_external_table_no_late_mat branch 2 times, most recently from 0d12c7d to 33c5e80 Compare October 24, 2025 04:41
@kaka11chen
Copy link
Contributor Author

run buildall

@kaka11chen kaka11chen force-pushed the nested_column_prune_external_table_no_late_mat branch from 33c5e80 to f059d14 Compare October 24, 2025 05:09
@kaka11chen
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

Cloud UT Coverage Report

Increment line coverage 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 80.64% (1649/2045)
Line Coverage 67.00% (29104/43437)
Region Coverage 67.32% (14419/21420)
Branch Coverage 57.73% (7674/13294)

@doris-robot
Copy link

ClickBench: Total hot run time: 27.8 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit f059d1417b9ec9216ace8201487208836abebf05, data reload: false

query1	0.06	0.05	0.05
query2	0.10	0.05	0.05
query3	0.26	0.08	0.09
query4	1.60	0.11	0.11
query5	0.28	0.26	0.26
query6	1.17	0.65	0.64
query7	0.04	0.03	0.03
query8	0.05	0.05	0.05
query9	0.64	0.53	0.53
query10	0.58	0.57	0.58
query11	0.17	0.13	0.12
query12	0.15	0.12	0.12
query13	0.61	0.61	0.60
query14	1.00	1.01	1.02
query15	0.84	0.83	0.86
query16	0.39	0.38	0.38
query17	1.01	1.03	1.02
query18	0.21	0.20	0.20
query19	1.88	1.79	1.77
query20	0.02	0.01	0.01
query21	15.46	0.21	0.12
query22	5.04	0.07	0.05
query23	15.67	0.27	0.10
query24	3.28	0.64	0.94
query25	0.09	0.07	0.06
query26	0.14	0.13	0.13
query27	0.06	0.07	0.05
query28	5.52	1.14	0.93
query29	12.56	3.92	3.26
query30	0.28	0.14	0.11
query31	2.83	0.60	0.38
query32	3.22	0.54	0.48
query33	3.01	3.05	3.00
query34	15.85	5.12	4.59
query35	4.55	4.62	4.60
query36	0.66	0.50	0.49
query37	0.10	0.07	0.07
query38	0.07	0.04	0.04
query39	0.04	0.03	0.03
query40	0.19	0.14	0.14
query41	0.09	0.03	0.03
query42	0.04	0.03	0.03
query43	0.04	0.04	0.04
Total cold run time: 99.85 s
Total hot run time: 27.8 s

@hello-stephen
Copy link
Contributor

FE Regression Coverage Report

Increment line coverage 75.89% (768/1012) 🎉
Increment coverage report
Complete coverage report

@kaka11chen kaka11chen force-pushed the nested_column_prune_external_table_no_late_mat branch from f059d14 to bb96ea9 Compare October 25, 2025 05:53
@doris-robot
Copy link

TPC-H: Total hot run time: 34014 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 1f09aae1fd91ac6bd5025170d36b4a7fd5f15469, data reload: false

------ Round 1 ----------------------------------
q1	16509	5089	5010	5010
q2	2016	319	205	205
q3	9806	1277	718	718
q4	9896	931	369	369
q5	7457	2365	2302	2302
q6	185	166	135	135
q7	928	769	608	608
q8	9138	1336	1043	1043
q9	6950	5112	5102	5102
q10	6810	2220	1817	1817
q11	498	305	290	290
q12	338	361	230	230
q13	17577	3638	3048	3048
q14	225	229	216	216
q15	569	513	499	499
q16	1034	999	934	934
q17	586	865	366	366
q18	7668	7018	7287	7018
q19	994	936	583	583
q20	360	349	224	224
q21	3709	3144	2311	2311
q22	1076	1044	986	986
Total cold run time: 104329 ms
Total hot run time: 34014 ms

----- Round 2, with runtime_filter_mode=off -----
q1	5098	5086	5246	5086
q2	251	327	232	232
q3	2175	2642	2291	2291
q4	1329	1753	1329	1329
q5	4171	4097	4551	4097
q6	228	174	135	135
q7	2006	1935	1810	1810
q8	2680	2646	2602	2602
q9	7400	7249	7388	7249
q10	3029	3266	2857	2857
q11	599	524	520	520
q12	673	764	626	626
q13	3435	3915	3284	3284
q14	302	317	301	301
q15	564	493	496	493
q16	1059	1105	1101	1101
q17	1188	1700	1405	1405
q18	7692	7967	7511	7511
q19	827	760	917	760
q20	1987	2063	1985	1985
q21	4899	4283	4299	4283
q22	1078	1032	1010	1010
Total cold run time: 52670 ms
Total hot run time: 50967 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 187859 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 1f09aae1fd91ac6bd5025170d36b4a7fd5f15469, data reload: false

query1	1042	400	398	398
query2	6609	1658	1707	1658
query3	6761	228	220	220
query4	26496	23924	23246	23246
query5	4353	599	473	473
query6	336	238	215	215
query7	4652	486	301	301
query8	301	259	256	256
query9	8695	2612	2660	2612
query10	494	336	291	291
query11	15463	14999	14792	14792
query12	175	120	113	113
query13	1713	593	458	458
query14	10788	9166	9190	9166
query15	198	187	173	173
query16	7316	679	528	528
query17	1259	780	649	649
query18	2008	430	335	335
query19	215	210	184	184
query20	133	123	124	123
query21	219	136	112	112
query22	3979	4079	3910	3910
query23	33857	33034	32976	32976
query24	8224	2429	2416	2416
query25	603	507	441	441
query26	1238	269	160	160
query27	2771	493	359	359
query28	4224	2246	2225	2225
query29	818	608	513	513
query30	302	223	194	194
query31	910	799	725	725
query32	83	74	69	69
query33	584	371	328	328
query34	792	846	525	525
query35	794	852	725	725
query36	947	1002	876	876
query37	118	108	90	90
query38	3521	3539	3465	3465
query39	1482	1425	1389	1389
query40	223	128	117	117
query41	62	61	58	58
query42	128	109	112	109
query43	493	499	475	475
query44	1261	774	773	773
query45	188	171	171	171
query46	896	990	654	654
query47	1757	1815	1741	1741
query48	391	425	320	320
query49	765	528	436	436
query50	641	675	405	405
query51	4016	3899	3880	3880
query52	116	115	103	103
query53	245	274	204	204
query54	320	286	288	286
query55	89	89	87	87
query56	313	329	315	315
query57	1178	1192	1114	1114
query58	283	280	269	269
query59	2531	2631	2504	2504
query60	346	340	325	325
query61	162	165	158	158
query62	797	730	674	674
query63	237	195	198	195
query64	4318	1144	958	958
query65	4048	3942	3942	3942
query66	1176	452	352	352
query67	15211	15316	15023	15023
query68	8171	907	641	641
query69	504	339	301	301
query70	1313	1292	1236	1236
query71	440	356	317	317
query72	6098	4982	4787	4787
query73	661	578	365	365
query74	8816	9139	8745	8745
query75	3296	3250	2833	2833
query76	3374	1155	739	739
query77	514	398	319	319
query78	9469	9623	9034	9034
query79	2560	816	619	619
query80	700	572	515	515
query81	512	263	222	222
query82	390	154	128	128
query83	259	268	249	249
query84	254	109	92	92
query85	925	487	439	439
query86	380	336	299	299
query87	3722	3745	3626	3626
query88	3804	2270	2282	2270
query89	409	342	298	298
query90	1890	213	229	213
query91	183	167	139	139
query92	81	68	67	67
query93	2145	999	671	671
query94	759	441	345	345
query95	404	327	311	311
query96	500	570	284	284
query97	2921	2965	2907	2907
query98	248	222	222	222
query99	1416	1399	1289	1289
Total cold run time: 273782 ms
Total hot run time: 187859 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 28.3 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 1f09aae1fd91ac6bd5025170d36b4a7fd5f15469, data reload: false

query1	0.06	0.05	0.04
query2	0.12	0.07	0.07
query3	0.30	0.08	0.07
query4	1.60	0.09	0.09
query5	0.27	0.25	0.24
query6	1.18	0.65	0.65
query7	0.04	0.03	0.02
query8	0.07	0.06	0.06
query9	0.65	0.57	0.53
query10	0.59	0.58	0.58
query11	0.25	0.13	0.14
query12	0.26	0.13	0.13
query13	0.65	0.62	0.62
query14	1.03	1.02	1.03
query15	0.93	0.85	0.85
query16	0.40	0.39	0.38
query17	1.02	1.07	1.03
query18	0.23	0.21	0.23
query19	1.95	1.88	1.81
query20	0.02	0.02	0.02
query21	15.45	0.27	0.24
query22	5.00	0.10	0.10
query23	15.38	0.38	0.22
query24	2.95	0.44	0.31
query25	0.10	0.09	0.09
query26	0.19	0.18	0.17
query27	0.09	0.09	0.09
query28	3.71	1.26	1.06
query29	12.62	3.95	3.28
query30	0.32	0.12	0.10
query31	2.82	0.63	0.47
query32	3.24	0.59	0.51
query33	3.11	3.08	3.11
query34	16.41	5.18	4.47
query35	4.55	4.57	4.55
query36	0.65	0.52	0.50
query37	0.22	0.09	0.09
query38	0.19	0.06	0.06
query39	0.06	0.05	0.05
query40	0.21	0.17	0.16
query41	0.11	0.06	0.06
query42	0.07	0.04	0.04
query43	0.06	0.05	0.06
Total cold run time: 99.13 s
Total hot run time: 28.3 s

@hello-stephen
Copy link
Contributor

BE UT Coverage Report

Increment line coverage 64.51% (1105/1713) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 52.95% (18328/34614)
Line Coverage 38.35% (167362/436384)
Region Coverage 33.24% (130152/391519)
Branch Coverage 34.08% (55943/164135)

@hello-stephen
Copy link
Contributor

BE Regression && UT Coverage Report

Increment line coverage 75.54% (1294/1713) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 71.62% (24372/34029)
Line Coverage 58.13% (254069/437105)
Region Coverage 53.38% (211931/397033)
Branch Coverage 54.81% (90555/165214)

@hello-stephen
Copy link
Contributor

FE Regression Coverage Report

Increment line coverage 70.56% (973/1379) 🎉
Increment coverage report
Complete coverage report

@kaka11chen kaka11chen changed the title Nested column prune external table no late mat [feature](reader) Improve performance by nested column prune reading Nov 10, 2025
@kaka11chen kaka11chen changed the title [feature](reader) Improve performance by nested column prune reading [feature](reader) Optimize Complex Type Column Reading with Column Pruning Nov 10, 2025
@morningman morningman requested a review from Copilot November 17, 2025 08:10
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR implements column pruning for complex types (Struct, Array, Map) to optimize read performance by selectively reading only the required sub-columns instead of reading entire complex type fields.

Key changes:

  • Added FE logic to calculate and track access paths for complex type fields
  • Implemented BE selective reading using columnAccessPath information from FE
  • Added session variable enable_prune_nested_column to control the feature
  • Added thrift structures (TColumnAccessPath, TDataAccessPath, TMetaAccessPath) to pass pruning information between FE and BE

Reviewed Changes

Copilot reviewed 126 out of 153 changed files in this pull request and generated no comments.

Show a summary per file
File Description
Descriptors.thrift Added access path structures for column pruning
descriptors.proto Added protobuf definitions for access paths
SessionVariable.java Added enable_prune_nested_column session variable
PlanNode.java Added printNestedColumns method and renamed typo method
SlotDescriptor.java Added access path fields and getters/setters
LogicalOlapScan.java Implemented SupportPruneNestedColumn interface
LogicalFileScan.java Added nested column pruning support
Multiple FE rules Added PushDownProject and NestedColumnPruning rules
Multiple BE files Implemented selective column reading logic
Files not reviewed (1)
  • .idea/vcs.xml: Language not supported
Comments suppressed due to low confidence (9)

gensrc/thrift/Descriptors.thrift:1

  • The comment states 'only access the keys' but should say 'only access the values' for the VALUES case.
    gensrc/proto/descriptors.proto:1
  • The comment states 'only access the keys' but should say 'only access the values' for the VALUES case.
    fe/fe-core/src/main/java/org/apache/doris/planner/PlanNode.java:1
  • Corrected typo in method name from 'getplanNodeExplainString' to 'getPlanNodeExplainString'.
    fe/fe-core/src/main/java/org/apache/doris/planner/PlanNode.java:1
  • The null check for slot.getDisplayAllAccessPaths() is duplicated on lines 944 and 945. Remove one of the duplicate checks.
    fe/fe-core/src/main/java/org/apache/doris/planner/PlanNode.java:1
  • The null check for slot.getDisplayPredicateAccessPaths() is duplicated on lines 966 and 967. Remove one of the duplicate checks.
    fe/fe-core/src/main/java/org/apache/doris/nereids/trees/TreeNode.java:1
  • The method was incorrectly calling foreach instead of foreachUp, causing infinite recursion or incorrect behavior. The fix correctly calls foreachUp to maintain the bottom-up traversal order.
    be/test/vec/exec/format/table/hive/hive_reader_test.cpp:1
  • Comment contains Chinese characters. Should be in English: 'profile uses STRUCT type'.
    be/test/vec/exec/format/table/hive/hive_reader_test.cpp:1
  • Comment contains Chinese characters. Should be in English: 'profile uses STRUCT type'.
    be/test/vec/exec/format/table/hive/hive_reader_test.cpp:1
  • Test name contains typo 'rrc' instead of 'orc'. Should be 'read_hive_orc_file'.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@@ -1,7 +0,0 @@
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN" "http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd">
<svg xmlns="http://www.w3.org/2000/svg" version="1.1" width="512px" height="512px" style="shape-rendering:geometricPrecision; text-rendering:geometricPrecision; image-rendering:optimizeQuality; fill-rule:evenodd; clip-rule:evenodd" xmlns:xlink="http://www.w3.org/1999/xlink">
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add back this folder


/** AccessPathInfo */
@Data
@AllArgsConstructor
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is better that do not use lombok

Comment on lines +36 to +37
private List<TColumnAccessPath> allAccessPaths;
private List<TColumnAccessPath> predicateAccessPaths;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add comment to explain what is allAccessPaths and what is predicateAccessPaths

Comment on lines +82 to +71
private List<TColumnAccessPath> allAccessPaths;
private List<TColumnAccessPath> predicateAccessPaths;
private List<TColumnAccessPath> displayAllAccessPaths;
private List<TColumnAccessPath> displayPredicateAccessPaths;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could them be final?

@@ -185,6 +186,8 @@ public int compare(TFileRangeDesc o1, TFileRangeDesc o2) {
}
output.append(String.format("numNodes=%s", numNodes)).append("\n");

printNestedColumns(output, prefix, getTupleDesc());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove this print

private <C extends Collection<E>, E extends Expression> Pair<Boolean, C> replaceExpressions(
C expressions, boolean propagateType, boolean fillAccessPaths) {
ImmutableCollection.Builder<E> newExprs;
if (expressions instanceof List) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need check C is not a queue?

}

private Expression replaceSlot(Expression e, boolean fillAccessPath) {
return MoreFieldsThread.keepFunctionSignature(false,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so, it maybe change function's signature? what will happen if we meet a round(struct_element(x, 'a'), 3) that x is coming from another struct_element

Comment on lines +78 to +79
StatementContext statementContext = jobContext.getCascadesContext().getStatementContext();
SessionVariable sessionVariable = statementContext.getConnectContext().getSessionVariable();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need check null?

return result;
}

/** DataTypeAccessTree */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add comment to explain each var

}

/** DataTypeAccessTree */
public static class DataTypeAccessTree {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this class need ut for it

"Disable debug points. please check config::enable_debug_points");
}
std::string result = status.to_json();
LOG(INFO) << "handle request result:" << result;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why del the code


Status MapFileColumnIterator::init(const ColumnIteratorOptions& opts) {
if (_reading_flag == ReadingFlag::SKIP_READING) {
LOG(INFO) << "Map column iterator column " << _column_name << " skip reading.";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

better use DLOG(INFO)

create_block_with_nested_columns(Block(arguments), numbers, false);
auto return_type = get_return_type_impl(
ColumnsWithTypeAndName(nested_block.begin(), nested_block.end()));
if (!return_type) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what case the return type is nullptr ?

@@ -0,0 +1,172 @@
// Licensed to the Apache Software Foundation (ASF) under one
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The license header of this file is wrong

.gitmodules Outdated
path = contrib/apache-orc
url = https://github.com/apache/doris-thirdparty.git
branch = orc
branch = cq_nested_column_prune_external_table
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to merge into orc's main branch?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes

@kaka11chen kaka11chen force-pushed the nested_column_prune_external_table_no_late_mat branch from 1f09aae to 69d64fd Compare November 22, 2025 17:24
@kaka11chen
Copy link
Contributor Author

run buildall

@kaka11chen kaka11chen force-pushed the nested_column_prune_external_table_no_late_mat branch from 69d64fd to ecbd3dd Compare November 22, 2025 17:34
@kaka11chen
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

Cloud UT Coverage Report

Increment line coverage 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 80.86% (1707/2111)
Line Coverage 66.86% (29892/44708)
Region Coverage 67.41% (14906/22114)
Branch Coverage 57.60% (7931/13768)

@hello-stephen
Copy link
Contributor

BE UT Coverage Report

Increment line coverage 64.51% (1105/1713) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 52.81% (18383/34812)
Line Coverage 38.31% (167972/438504)
Region Coverage 33.14% (130544/393954)
Branch Coverage 34.04% (56127/164898)

@kaka11chen kaka11chen force-pushed the nested_column_prune_external_table_no_late_mat branch from ecbd3dd to f50cf6b Compare November 23, 2025 01:21
@kaka11chen
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

Cloud UT Coverage Report

Increment line coverage 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 80.86% (1707/2111)
Line Coverage 66.85% (29889/44708)
Region Coverage 67.41% (14906/22114)
Branch Coverage 57.57% (7926/13768)

@doris-robot
Copy link

BE UT Coverage Report

Increment line coverage 64.51% (1105/1713) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 52.81% (18383/34812)
Line Coverage 38.30% (167964/438504)
Region Coverage 33.09% (130379/393954)
Branch Coverage 34.03% (56118/164898)

@hello-stephen
Copy link
Contributor

FE UT Coverage Report

Increment line coverage 61.71% (851/1379) 🎉
Increment coverage report
Complete coverage report

@hello-stephen
Copy link
Contributor

BE Regression && UT Coverage Report

Increment line coverage 75.54% (1294/1713) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 71.59% (24464/34174)
Line Coverage 58.08% (254762/438671)
Region Coverage 53.23% (212516/399255)
Branch Coverage 54.73% (90801/165905)

@hello-stephen
Copy link
Contributor

FE Regression Coverage Report

Increment line coverage 70.56% (973/1379) 🎉
Increment coverage report
Complete coverage report

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants