spark load 导入ssb lineorder 800GB数据,时间过长3个小时都导不进去 #6713
Unanswered
gj-zhang
asked this question in
A - General / Q&A
Replies: 2 comments
-
|
hdfs文件的方式为什么会存在数据倾斜?800GB数据,建议使用 broker load,分10个批次导入,应该没啥问题。 |
Beta Was this translation helpful? Give feedback.
0 replies
-
|
这个只能看spark作业看具体是慢在哪一步,是资源不充足还是某一步确实很慢 |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
集群描述:
版本 0.14.0
1个fe节点,3个be节点,4个broker节点
单机配置: 72核,300GB内存, 11T ssd
lineorder表在hive中有60多亿条
spark resource
hive外部表方式
hdfs文件方式
麻烦帮忙看下, 用hdfs文件的方式存在数据倾斜, 用hive外部表的方式耗时很长。 两种方式基本上都跑了3个小时没跑完被我手动杀掉了? 是否是我用的不对,请求帮助已经搞了好几天了。。。
Beta Was this translation helpful? Give feedback.
All reactions