Skip to content

Conversation

@morningman
Copy link
Contributor

@morningman morningman commented Apr 14, 2025

What problem does this PR solve?

Sometimes user may meet error:

Storage schema reading not supported

when using hive catalog to query table.

Because there are some compatibility issue in hive metastore, see trinodb/trino#2678.
So here we provide a catalog property:

get_schema_from_table.

Default is false, which will still get schema from hive metastore like before.
If set to true, the schema will be got from table object directly, to avoid above error.
But notice that if set to true, the default value of column will be ignored because the table object
does not store this information.

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@Thearas
Copy link
Contributor

Thearas commented Apr 14, 2025

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@github-actions github-actions bot added the lfs-detected! Warning Label for use when LFS is detected in the commits of a Pull Request label Apr 18, 2025
@github-actions
Copy link
Contributor

Possible file(s) that should be tracked in LFS detected: 🚨

The following file(s) exceeds the file size limit: 1048576 bytes, as set in the .yml configuration files:

  • regression-test/data/external_table_p0/hive/test_hive_basic_type.out

Consider using git-lfs to manage large files.

@morningman
Copy link
Contributor Author

run buildall

@github-actions github-actions bot removed the lfs-detected! Warning Label for use when LFS is detected in the commits of a Pull Request label Apr 18, 2025
@doris-robot
Copy link

TPC-H: Total hot run time: 34043 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit f229b38ce1bca5364ba7151246057fc96b928827, data reload: false

------ Round 1 ----------------------------------
q1	26296	5148	5047	5047
q2	2083	268	195	195
q3	10554	1261	706	706
q4	10232	1025	565	565
q5	7728	2352	2373	2352
q6	182	164	129	129
q7	927	754	614	614
q8	9320	1295	1128	1128
q9	6734	5084	5062	5062
q10	6861	2272	1866	1866
q11	486	281	269	269
q12	362	363	218	218
q13	17768	3639	3087	3087
q14	221	226	199	199
q15	531	477	470	470
q16	461	452	389	389
q17	599	861	382	382
q18	7782	7227	7102	7102
q19	1204	934	556	556
q20	329	327	226	226
q21	4490	3561	2517	2517
q22	1064	1038	964	964
Total cold run time: 116214 ms
Total hot run time: 34043 ms

----- Round 2, with runtime_filter_mode=off -----
q1	5131	5094	5095	5094
q2	250	326	231	231
q3	2144	2679	2273	2273
q4	1433	1849	1553	1553
q5	4536	4438	4388	4388
q6	213	163	122	122
q7	1951	1894	1722	1722
q8	2658	2568	2597	2568
q9	7169	7096	7079	7079
q10	2965	3209	2733	2733
q11	566	513	487	487
q12	712	752	582	582
q13	3441	3960	3307	3307
q14	275	311	273	273
q15	512	484	469	469
q16	468	514	451	451
q17	1152	1516	1435	1435
q18	7689	7431	7430	7430
q19	800	806	892	806
q20	2023	2001	1874	1874
q21	5293	4736	4579	4579
q22	1055	1016	960	960
Total cold run time: 52436 ms
Total hot run time: 50416 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 185333 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit f229b38ce1bca5364ba7151246057fc96b928827, data reload: false

query1	1029	469	502	469
query2	6580	1795	1799	1795
query3	6758	228	218	218
query4	27035	23195	23246	23195
query5	4287	619	463	463
query6	307	212	204	204
query7	4631	477	274	274
query8	292	241	248	241
query9	8629	2507	2521	2507
query10	461	308	252	252
query11	15231	15074	14924	14924
query12	153	109	107	107
query13	1662	503	397	397
query14	9220	6167	6018	6018
query15	206	203	165	165
query16	7398	639	453	453
query17	1205	765	560	560
query18	1955	389	297	297
query19	186	176	170	170
query20	120	124	121	121
query21	217	119	102	102
query22	4377	4377	4256	4256
query23	34109	33005	32833	32833
query24	8333	2368	2377	2368
query25	524	437	390	390
query26	1255	271	150	150
query27	2727	488	321	321
query28	4388	2076	2062	2062
query29	755	534	452	452
query30	282	213	180	180
query31	933	861	746	746
query32	72	59	60	59
query33	572	359	301	301
query34	798	858	508	508
query35	784	801	726	726
query36	952	964	895	895
query37	108	98	73	73
query38	4097	4172	4045	4045
query39	1470	1420	1405	1405
query40	208	121	108	108
query41	61	54	51	51
query42	126	100	104	100
query43	494	499	485	485
query44	1280	774	792	774
query45	175	168	168	168
query46	831	1009	623	623
query47	1773	1807	1721	1721
query48	357	397	290	290
query49	751	511	470	470
query50	630	710	393	393
query51	4122	4307	4201	4201
query52	114	109	101	101
query53	222	259	180	180
query54	563	572	497	497
query55	81	81	83	81
query56	289	299	298	298
query57	1153	1128	1061	1061
query58	262	247	241	241
query59	2574	2616	2519	2519
query60	316	313	301	301
query61	129	123	126	123
query62	820	731	656	656
query63	221	179	184	179
query64	4296	988	680	680
query65	4353	4218	4278	4218
query66	1139	423	310	310
query67	15711	15537	15378	15378
query68	7766	870	507	507
query69	474	302	255	255
query70	1134	1122	1056	1056
query71	451	305	293	293
query72	5533	4718	4883	4718
query73	680	662	342	342
query74	8944	9178	8736	8736
query75	3528	3203	2669	2669
query76	3492	1178	741	741
query77	719	375	287	287
query78	9848	9932	9320	9320
query79	2322	808	556	556
query80	581	501	428	428
query81	480	259	218	218
query82	483	131	95	95
query83	253	243	228	228
query84	244	104	89	89
query85	802	360	312	312
query86	376	321	307	307
query87	4417	4353	4318	4318
query88	3684	2233	2205	2205
query89	375	309	276	276
query90	1872	213	203	203
query91	134	138	108	108
query92	81	67	61	61
query93	1836	917	586	586
query94	688	443	301	301
query95	374	290	282	282
query96	480	573	272	272
query97	3181	3208	3183	3183
query98	224	209	203	203
query99	1692	1368	1269	1269
Total cold run time: 274297 ms
Total hot run time: 185333 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 29.56 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit f229b38ce1bca5364ba7151246057fc96b928827, data reload: false

query1	0.04	0.04	0.04
query2	0.12	0.10	0.11
query3	0.26	0.18	0.20
query4	1.60	0.18	0.19
query5	0.58	0.59	0.59
query6	1.19	0.70	0.71
query7	0.02	0.02	0.01
query8	0.04	0.03	0.03
query9	0.58	0.52	0.52
query10	0.58	0.57	0.57
query11	0.17	0.11	0.11
query12	0.15	0.10	0.11
query13	0.62	0.60	0.60
query14	1.21	1.21	1.19
query15	0.87	0.86	0.83
query16	0.39	0.39	0.40
query17	1.07	1.06	1.01
query18	0.22	0.20	0.20
query19	1.87	1.75	1.82
query20	0.01	0.01	0.01
query21	15.40	0.92	0.54
query22	0.76	1.24	0.79
query23	15.21	1.35	0.62
query24	7.65	0.76	1.08
query25	0.47	0.26	0.06
query26	0.51	0.16	0.14
query27	0.05	0.04	0.04
query28	9.98	0.85	0.44
query29	12.54	4.16	3.42
query30	0.25	0.08	0.06
query31	2.84	0.61	0.38
query32	3.24	0.55	0.46
query33	3.06	3.01	3.12
query34	15.67	5.09	4.48
query35	4.52	4.50	4.48
query36	0.70	0.50	0.47
query37	0.09	0.06	0.07
query38	0.05	0.04	0.03
query39	0.03	0.02	0.02
query40	0.16	0.13	0.12
query41	0.08	0.02	0.02
query42	0.03	0.02	0.02
query43	0.04	0.03	0.03
Total cold run time: 104.92 s
Total hot run time: 29.56 s

@github-actions
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@github-actions github-actions bot added approved Indicates a PR has been approved by one committer. reviewed labels Apr 21, 2025
@github-actions
Copy link
Contributor

PR approved by anyone and no changes requested.

@morningman morningman merged commit 234c8eb into apache:master Apr 22, 2025
27 of 28 checks passed
github-actions bot pushed a commit that referenced this pull request Apr 22, 2025
### What problem does this PR solve?

Sometimes user may meet error:
```
Storage schema reading not supported
```
when using hive catalog to query table.

Because there are some compatibility issue in hive metastore, see
trinodb/trino#2678.
So here we provide a catalog property:

`get_schema_from_table`.

Default is false, which will still get schema from hive metastore like
before.
If set to true, the schema will be got from `table` object directly, to
avoid above error.
But notice that if set to true, the default value of column will be
ignored because the `table` object
does not store this information.
github-actions bot pushed a commit that referenced this pull request Apr 22, 2025
### What problem does this PR solve?

Sometimes user may meet error:
```
Storage schema reading not supported
```
when using hive catalog to query table.

Because there are some compatibility issue in hive metastore, see
trinodb/trino#2678.
So here we provide a catalog property:

`get_schema_from_table`.

Default is false, which will still get schema from hive metastore like
before.
If set to true, the schema will be got from `table` object directly, to
avoid above error.
But notice that if set to true, the default value of column will be
ignored because the `table` object
does not store this information.
yiguolei pushed a commit that referenced this pull request Apr 22, 2025
dataroaring pushed a commit that referenced this pull request Apr 23, 2025
@yiguolei yiguolei mentioned this pull request May 13, 2025
koarz pushed a commit to koarz/doris that referenced this pull request Jun 4, 2025
### What problem does this PR solve?

Sometimes user may meet error:
```
Storage schema reading not supported
```
when using hive catalog to query table.

Because there are some compatibility issue in hive metastore, see
trinodb/trino#2678.
So here we provide a catalog property:

`get_schema_from_table`.

Default is false, which will still get schema from hive metastore like
before.
If set to true, the schema will be got from `table` object directly, to
avoid above error.
But notice that if set to true, the default value of column will be
ignored because the `table` object
does not store this information.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. dev/2.1.10-merged dev/3.0.6-merged reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants