-
Notifications
You must be signed in to change notification settings - Fork 3.5k
Open
Labels
kind/featureCategorizes issue or PR as related to a new feature.Categorizes issue or PR as related to a new feature.
Description
Search before asking
- I had searched in the issues and found no similar issues.
Description
任务描述
file cache 预热的时候,如果某个 be 节点在过程中发生错误,或者用户手动发出 cancel SQL,需要等一段时间才能完全关闭这次预热。这是因为 cancel 信号的传递需要时间,这个时间的源头是因为我们的预热分成 batch,be 节点一次预热一批数据(warm up task)。当这批完成以后才会知道自己是否需要预热下一批数据。我们现在需要一个机制,能够在 fe 收到 cancel 请求(发生错误或者用户手动提交)时,及时打断 be 节点正在执行的 warm up task。
背景知识
之前有录过一个串讲视频,就不文字详细介绍了,请看视频 https://www.bilibili.com/video/BV16c2ZYHEzm
关注的主要源文件
BE侧:
be/src/cloud/cloud_warm_up_manager.cpp
FE侧:
fe/fe-core/src/main/java/org/apache/doris/analysis/CancelCloudWarmUpStmt.java
fe/fe-core/src/main/java/org/apache/doris/cloud/CacheHotspotManager.java
方案
基本思路是在 be RPC层 (be/src/service/backend_service.cpp) 增加一个 cancel warm up 的接口,在新增的接口中调用 cloud_warm_up_manager 的 cancel逻辑(需要自行实现), 然后 FE 调用这个 rpc 就能快速关闭 be 上的 warm up task。
Use case
No response
Related issues
No response
Are you willing to submit PR?
- Yes I am willing to submit a PR!
Code of Conduct
- I agree to follow this project's Code of Conduct
Metadata
Metadata
Assignees
Labels
kind/featureCategorizes issue or PR as related to a new feature.Categorizes issue or PR as related to a new feature.