Skip to content

Conversation

@CalvinKirs
Copy link
Member

@CalvinKirs CalvinKirs commented Apr 14, 2025

Issue Number: #50238

Background

Previously, all storage-related property conversions were handled in a single class: PropertyConvert. This class included logic for multiple components such as:

  • BE storage configuration

  • Frontend (FE) object storage

  • HDFS(FE) configuration

Over time, this approach introduced several problems:

Tight Coupling: Different storage types (e.g., S3, OSS, COS, HDFS) were processed in a mixed manner.

Inconsistent Behavior: The same storage type behaved differently across components. For instance:

Some services accepted https:// style URIs.

Others only accepted s3:// style URIs.

High Maintenance Cost: Adding or updating logic for a single storage type risked breaking unrelated paths.

Low Extensibility: Introducing new storage types or protocols required invasive changes to centralized logic.

Changed

This PR refactors the property conversion logic with the following goals:

Separation of Responsibility:

Each storage type (e.g., S3, COS, HDFS) now manages its own property parsing and conversion.

No cross-dependency between different storage implementations.

Unified Interface for Upper Layers:

A single unified interface is exposed to business logic (e.g., generating properties for BE).

Upper layers no longer care about the specific storage type or URI scheme.

Consistent Behavior Across Components:

Each storage implementation defines its own rules.

Eliminates inconsistencies like accepting different URI styles in different parts of the system.

Future-Friendly Design:

Lays the groundwork for plugin-based SPI support.

What problem does this PR solve?

Issue Number: close #xxx

Related PR: #xxx

Problem Summary:

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

## Background
Previously, all storage-related property conversions were handled in a single class: PropertyConvert. This class included logic for multiple components such as:

- BE storage configuration

- Frontend (FE) object storage

- HDFS(FE) configuration

Over time, this approach introduced several problems:

Tight Coupling: Different storage types (e.g., S3, OSS, COS, HDFS) were processed in a mixed manner.

Inconsistent Behavior: The same storage type behaved differently across components. For instance:

Some services accepted https:// style URIs.

Others only accepted s3:// style URIs.

High Maintenance Cost: Adding or updating logic for a single storage type risked breaking unrelated paths.

Low Extensibility: Introducing new storage types or protocols required invasive changes to centralized logic.

## Changed
This PR refactors the property conversion logic with the following goals:

### Separation of Responsibility:

Each storage type (e.g., S3, COS, HDFS) now manages its own property parsing and conversion.

No cross-dependency between different storage implementations.

### Unified Interface for Upper Layers:

A single unified interface is exposed to business logic (e.g., generating properties for BE).

Upper layers no longer care about the specific storage type or URI scheme.

### Consistent Behavior Across Components:

Each storage implementation defines its own rules.

Eliminates inconsistencies like accepting different URI styles in different parts of the system.

### Future-Friendly Design:

Lays the groundwork for plugin-based SPI support.
@CalvinKirs CalvinKirs requested a review from morningman as a code owner April 14, 2025 13:21
@hello-stephen
Copy link
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@CalvinKirs
Copy link
Member Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 34175 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit ab51b701470811ee150492f4a43a00ab515e3118, data reload: false

------ Round 1 ----------------------------------
q1	26087	5121	5129	5121
q2	2074	283	192	192
q3	10384	1244	721	721
q4	10227	1051	543	543
q5	7553	2407	2378	2378
q6	191	170	139	139
q7	912	749	622	622
q8	9318	1303	1065	1065
q9	6781	5107	5102	5102
q10	6824	2315	1896	1896
q11	492	276	263	263
q12	344	392	220	220
q13	17766	3708	3114	3114
q14	229	228	210	210
q15	537	491	485	485
q16	608	636	591	591
q17	562	862	353	353
q18	7669	7145	6999	6999
q19	1200	960	564	564
q20	340	339	219	219
q21	4034	3394	2433	2433
q22	1039	995	945	945
Total cold run time: 115171 ms
Total hot run time: 34175 ms

----- Round 2, with runtime_filter_mode=off -----
q1	5156	5064	5072	5064
q2	231	322	230	230
q3	2134	2650	2277	2277
q4	1399	1814	1383	1383
q5	4592	4482	4362	4362
q6	213	168	126	126
q7	1972	1899	1771	1771
q8	2618	2527	2576	2527
q9	7220	7155	7140	7140
q10	2987	3174	2725	2725
q11	567	510	484	484
q12	667	775	611	611
q13	3584	3869	3271	3271
q14	278	289	271	271
q15	508	481	479	479
q16	656	694	659	659
q17	1121	1581	1393	1393
q18	7849	7683	7506	7506
q19	800	775	883	775
q20	1998	2024	1859	1859
q21	5114	4665	4689	4665
q22	1077	1019	979	979
Total cold run time: 52741 ms
Total hot run time: 50557 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 186226 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit ab51b701470811ee150492f4a43a00ab515e3118, data reload: false

query1	1031	492	506	492
query2	6571	1868	1857	1857
query3	6747	226	218	218
query4	26347	23683	22969	22969
query5	4332	591	470	470
query6	292	202	206	202
query7	4632	481	288	288
query8	303	241	245	241
query9	8609	2571	2598	2571
query10	440	322	267	267
query11	15511	15007	14757	14757
query12	174	119	102	102
query13	1645	514	402	402
query14	9423	6241	6184	6184
query15	213	191	178	178
query16	7242	660	480	480
query17	1188	706	602	602
query18	1966	418	318	318
query19	195	186	163	163
query20	136	122	133	122
query21	220	128	107	107
query22	4198	4137	4067	4067
query23	33896	32990	33083	32990
query24	8504	2417	2417	2417
query25	548	479	422	422
query26	1240	275	156	156
query27	2733	500	329	329
query28	4331	2414	2384	2384
query29	764	549	411	411
query30	287	219	193	193
query31	937	848	764	764
query32	71	63	70	63
query33	586	350	319	319
query34	789	883	521	521
query35	771	809	742	742
query36	942	977	888	888
query37	122	101	75	75
query38	4236	4240	4101	4101
query39	1454	1429	1389	1389
query40	214	133	111	111
query41	57	55	51	51
query42	120	109	102	102
query43	489	501	495	495
query44	1298	805	795	795
query45	176	171	167	167
query46	829	1018	625	625
query47	1737	1772	1725	1725
query48	374	410	307	307
query49	768	491	424	424
query50	617	681	392	392
query51	4126	4151	4091	4091
query52	105	105	99	99
query53	230	245	186	186
query54	575	568	509	509
query55	82	81	81	81
query56	302	290	272	272
query57	1139	1126	1061	1061
query58	272	260	255	255
query59	2595	2650	2528	2528
query60	329	314	322	314
query61	129	125	124	124
query62	784	717	676	676
query63	227	184	191	184
query64	4397	1002	692	692
query65	4374	4249	4282	4249
query66	1154	419	318	318
query67	15668	15647	15323	15323
query68	8546	870	518	518
query69	461	295	262	262
query70	1223	1089	1115	1089
query71	480	318	297	297
query72	5336	4761	4736	4736
query73	698	602	338	338
query74	8887	9101	8975	8975
query75	4047	3211	2692	2692
query76	3708	1185	766	766
query77	790	375	295	295
query78	9990	10007	9296	9296
query79	2941	822	571	571
query80	625	519	465	465
query81	494	267	226	226
query82	512	125	104	104
query83	301	257	234	234
query84	301	115	83	83
query85	874	346	313	313
query86	386	309	281	281
query87	4556	4469	4341	4341
query88	3498	2259	2223	2223
query89	406	322	291	291
query90	1864	213	215	213
query91	141	139	113	113
query92	77	65	61	61
query93	2127	959	582	582
query94	668	424	278	278
query95	379	299	286	286
query96	495	562	279	279
query97	3161	3211	3129	3129
query98	239	205	215	205
query99	1439	1394	1307	1307
Total cold run time: 276443 ms
Total hot run time: 186226 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 31.16 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit ab51b701470811ee150492f4a43a00ab515e3118, data reload: false

query1	0.04	0.04	0.03
query2	0.12	0.11	0.11
query3	0.26	0.19	0.20
query4	1.61	0.19	0.19
query5	0.59	0.58	0.60
query6	1.20	0.70	0.73
query7	0.02	0.02	0.02
query8	0.04	0.03	0.03
query9	0.58	0.52	0.50
query10	0.57	0.57	0.57
query11	0.15	0.10	0.11
query12	0.15	0.11	0.11
query13	0.62	0.59	0.59
query14	2.72	2.69	2.69
query15	0.91	0.86	0.86
query16	0.39	0.38	0.36
query17	1.06	1.03	1.03
query18	0.20	0.19	0.19
query19	1.91	1.88	1.87
query20	0.01	0.01	0.02
query21	15.35	0.94	0.56
query22	0.74	1.25	0.90
query23	14.73	1.37	0.66
query24	6.56	1.41	0.54
query25	0.46	0.20	0.12
query26	0.64	0.17	0.13
query27	0.05	0.05	0.05
query28	9.20	0.94	0.42
query29	12.55	3.94	3.29
query30	0.25	0.10	0.06
query31	2.83	0.60	0.38
query32	3.23	0.56	0.45
query33	3.01	3.06	3.12
query34	15.68	5.15	4.48
query35	4.56	4.52	4.51
query36	0.65	0.49	0.48
query37	0.09	0.07	0.06
query38	0.06	0.04	0.04
query39	0.03	0.03	0.03
query40	0.17	0.14	0.14
query41	0.08	0.03	0.02
query42	0.03	0.02	0.02
query43	0.03	0.04	0.03
Total cold run time: 104.13 s
Total hot run time: 31.16 s

try {
S3URI s3uri = S3URI.create(uri, usePathStyle, forceParsingByStandardUri);
return s3uri.getEndpoint().orElse(null);
} catch (UserException e) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should print this exception for debugging.
Maybe we can just throw this exception

import java.util.Map;
import java.util.Set;

public class HdfsPropertiesUtils {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Name it as HDFSPropertiesUtils, or change the HDFSProperties to HdfsProperties

}
}

public String getRegion() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If cosRegion is null and cosEndpoint does not contains "myqcloud.com", it will return null

@CalvinKirs
Copy link
Member Author

run buildall

@CalvinKirs CalvinKirs force-pushed the master-params-refactor branch from 3c0bb17 to 2ac771f Compare April 18, 2025 06:04
@CalvinKirs
Copy link
Member Author

run buildall

@CalvinKirs
Copy link
Member Author

run buildall

@CalvinKirs
Copy link
Member Author

run performance
run cloudp0

@CalvinKirs
Copy link
Member Author

run cloud_p0

@doris-robot
Copy link

TPC-H: Total hot run time: 34063 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit ccecabea396e7b28fd34cd8f8f28df25373fda24, data reload: false

------ Round 1 ----------------------------------
q1	25974	5081	5044	5044
q2	2077	280	181	181
q3	10395	1256	702	702
q4	10231	1011	534	534
q5	8343	2367	2378	2367
q6	275	165	131	131
q7	914	729	623	623
q8	9322	1274	1069	1069
q9	7507	5152	5131	5131
q10	6868	2307	1859	1859
q11	474	286	272	272
q12	342	351	219	219
q13	17784	3688	3139	3139
q14	237	229	211	211
q15	548	509	511	509
q16	451	457	397	397
q17	612	858	376	376
q18	7827	7080	7120	7080
q19	1362	952	559	559
q20	325	336	229	229
q21	4145	2639	2451	2451
q22	1012	983	980	980
Total cold run time: 117025 ms
Total hot run time: 34063 ms

----- Round 2, with runtime_filter_mode=off -----
q1	5125	5071	5071	5071
q2	239	337	226	226
q3	2158	2674	2249	2249
q4	1474	1896	1509	1509
q5	4572	4471	4454	4454
q6	214	169	132	132
q7	1970	1914	1771	1771
q8	2614	2603	2550	2550
q9	7173	7148	7190	7148
q10	3010	3165	2779	2779
q11	581	527	498	498
q12	669	752	586	586
q13	3548	3899	3365	3365
q14	274	310	266	266
q15	529	485	471	471
q16	459	517	477	477
q17	1171	1610	1371	1371
q18	7789	7591	7414	7414
q19	790	780	952	780
q20	2094	2017	1797	1797
q21	5303	4751	4695	4695
q22	1071	1022	934	934
Total cold run time: 52827 ms
Total hot run time: 50543 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 185198 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit ccecabea396e7b28fd34cd8f8f28df25373fda24, data reload: false

query1	1017	481	489	481
query2	6548	1793	1753	1753
query3	6741	225	221	221
query4	26977	23332	23386	23332
query5	4402	628	455	455
query6	300	200	211	200
query7	4619	491	270	270
query8	285	244	229	229
query9	8624	2528	2533	2528
query10	457	326	253	253
query11	15528	15007	14738	14738
query12	150	108	104	104
query13	1649	503	374	374
query14	8685	6140	6115	6115
query15	203	183	171	171
query16	7123	639	472	472
query17	948	722	591	591
query18	1965	407	312	312
query19	198	177	162	162
query20	128	113	117	113
query21	216	137	107	107
query22	4142	4298	4110	4110
query23	33945	33030	33021	33021
query24	8462	2385	2379	2379
query25	566	473	426	426
query26	1250	268	158	158
query27	2747	512	328	328
query28	4342	2076	2075	2075
query29	760	571	485	485
query30	282	218	186	186
query31	921	867	759	759
query32	72	66	70	66
query33	559	347	320	320
query34	783	854	507	507
query35	781	835	733	733
query36	977	994	878	878
query37	121	101	78	78
query38	4141	4088	4125	4088
query39	1450	1404	1396	1396
query40	209	116	104	104
query41	58	54	54	54
query42	127	107	106	106
query43	483	491	461	461
query44	1265	790	791	790
query45	181	176	168	168
query46	838	1009	609	609
query47	1745	1791	1730	1730
query48	386	390	289	289
query49	789	506	409	409
query50	632	666	424	424
query51	4127	4125	4115	4115
query52	112	104	103	103
query53	224	257	190	190
query54	570	564	490	490
query55	83	81	81	81
query56	285	326	274	274
query57	1137	1158	1060	1060
query58	284	262	253	253
query59	2590	2598	2524	2524
query60	324	334	302	302
query61	125	124	124	124
query62	815	722	672	672
query63	225	186	183	183
query64	4306	1007	654	654
query65	4321	4239	4286	4239
query66	1127	424	308	308
query67	15661	15603	15153	15153
query68	8133	883	519	519
query69	474	292	259	259
query70	1219	1145	1103	1103
query71	456	308	288	288
query72	5558	4732	4782	4732
query73	738	621	344	344
query74	9204	9232	8698	8698
query75	3762	3251	2625	2625
query76	3625	1186	748	748
query77	791	360	284	284
query78	9955	10108	9297	9297
query79	2941	815	556	556
query80	641	518	446	446
query81	461	255	221	221
query82	454	125	94	94
query83	278	248	241	241
query84	292	104	89	89
query85	803	359	312	312
query86	338	305	279	279
query87	4409	4500	4337	4337
query88	2876	2237	2220	2220
query89	422	321	277	277
query90	1918	220	216	216
query91	148	139	111	111
query92	78	65	57	57
query93	1866	944	579	579
query94	671	414	302	302
query95	368	298	278	278
query96	478	557	272	272
query97	3175	3229	3082	3082
query98	223	215	207	207
query99	1459	1425	1282	1282
Total cold run time: 274307 ms
Total hot run time: 185198 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 29.96 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit ccecabea396e7b28fd34cd8f8f28df25373fda24, data reload: false

query1	0.04	0.04	0.03
query2	0.12	0.11	0.10
query3	0.26	0.20	0.20
query4	1.60	0.19	0.20
query5	0.61	0.58	0.61
query6	1.19	0.71	0.71
query7	0.03	0.02	0.02
query8	0.04	0.04	0.04
query9	0.58	0.54	0.51
query10	0.59	0.58	0.57
query11	0.16	0.11	0.11
query12	0.15	0.11	0.11
query13	0.61	0.60	0.60
query14	1.18	1.17	1.17
query15	0.89	0.85	0.84
query16	0.40	0.40	0.37
query17	1.00	1.03	0.99
query18	0.22	0.19	0.20
query19	1.91	1.78	1.80
query20	0.02	0.00	0.02
query21	15.44	0.88	0.58
query22	0.75	1.15	0.62
query23	15.05	1.40	0.62
query24	6.97	1.40	0.97
query25	0.50	0.16	0.12
query26	0.67	0.15	0.13
query27	0.05	0.05	0.05
query28	9.69	0.85	0.46
query29	12.60	4.21	3.46
query30	0.25	0.09	0.06
query31	2.83	0.59	0.40
query32	3.23	0.54	0.49
query33	3.08	3.04	3.06
query34	15.75	5.15	4.52
query35	4.57	4.55	4.57
query36	0.66	0.50	0.48
query37	0.09	0.06	0.07
query38	0.05	0.04	0.03
query39	0.03	0.02	0.03
query40	0.17	0.14	0.12
query41	0.08	0.03	0.02
query42	0.04	0.02	0.02
query43	0.04	0.04	0.03
Total cold run time: 104.19 s
Total hot run time: 29.96 s

@CalvinKirs
Copy link
Member Author

run cloud_p0

Copy link
Contributor

@morningman morningman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@github-actions github-actions bot added approved Indicates a PR has been approved by one committer. reviewed labels Apr 18, 2025
@github-actions
Copy link
Contributor

PR approved by anyone and no changes requested.

@morningman morningman merged commit be7617e into apache:master Apr 19, 2025
25 of 26 checks passed
koarz pushed a commit to koarz/doris that referenced this pull request Jun 4, 2025
apache#50031)

## Background
Previously, all storage-related property conversions were handled in a
single class: PropertyConvert. This class included logic for multiple
components such as:

- BE storage configuration

- Frontend (FE) object storage

- HDFS(FE) configuration

Over time, this approach introduced several problems:

Tight Coupling: Different storage types (e.g., S3, OSS, COS, HDFS) were
processed in a mixed manner.

Inconsistent Behavior: The same storage type behaved differently across
components. For instance:

Some services accepted https:// style URIs.

Others only accepted s3:// style URIs.

High Maintenance Cost: Adding or updating logic for a single storage
type risked breaking unrelated paths.

Low Extensibility: Introducing new storage types or protocols required
invasive changes to centralized logic.

## Changed
This PR refactors the property conversion logic with the following
goals:

### Separation of Responsibility:

Each storage type (e.g., S3, COS, HDFS) now manages its own property
parsing and conversion.

No cross-dependency between different storage implementations.

### Unified Interface for Upper Layers:

A single unified interface is exposed to business logic (e.g.,
generating properties for BE).

Upper layers no longer care about the specific storage type or URI
scheme.

### Consistent Behavior Across Components:

Each storage implementation defines its own rules.

Eliminates inconsistencies like accepting different URI styles in
different parts of the system.

### Future-Friendly Design:

Lays the groundwork for plugin-based SPI support.
CalvinKirs added a commit to CalvinKirs/incubator-doris that referenced this pull request Jun 24, 2025
… Unification (apache#50031)

## Background
Previously, all storage-related property conversions were handled in a
single class: PropertyConvert. This class included logic for multiple
components such as:

- BE storage configuration

- Frontend (FE) object storage

- HDFS(FE) configuration

Over time, this approach introduced several problems:

Tight Coupling: Different storage types (e.g., S3, OSS, COS, HDFS) were
processed in a mixed manner.

Inconsistent Behavior: The same storage type behaved differently across
components. For instance:

Some services accepted https:// style URIs.

Others only accepted s3:// style URIs.

High Maintenance Cost: Adding or updating logic for a single storage
type risked breaking unrelated paths.

Low Extensibility: Introducing new storage types or protocols required
invasive changes to centralized logic.

## Changed
This PR refactors the property conversion logic with the following
goals:

### Separation of Responsibility:

Each storage type (e.g., S3, COS, HDFS) now manages its own property
parsing and conversion.

No cross-dependency between different storage implementations.

### Unified Interface for Upper Layers:

A single unified interface is exposed to business logic (e.g.,
generating properties for BE).

Upper layers no longer care about the specific storage type or URI
scheme.

### Consistent Behavior Across Components:

Each storage implementation defines its own rules.

Eliminates inconsistencies like accepting different URI styles in
different parts of the system.

### Future-Friendly Design:

Lays the groundwork for plugin-based SPI support.

(cherry picked from commit be7617e)
morrySnow pushed a commit that referenced this pull request Jun 26, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. dev/3.1.0-merged reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants