-
Notifications
You must be signed in to change notification settings - Fork 161
Open
Description
Code of Conduct
- I agree to follow this project's Code of Conduct
Search before asking
- I have searched in the issues and found no similar issues.
Describe the feature
When task is killed for stage cancel, another task attempt succeed or some other reasons, The AddBlockEvent
handling and sendShuffleData
still work.
Although needCancelRequest
may cancel some work, but the AddBlockEvent
in the blocking queue of threadPool still holds the shuffleblockdata, and so as to the rpc request that are already called but waiting for repsonse.
That will cause 3 problems:
- We freeAll memory onece the task is killed, but the shuffleBlockData hold by the async thread still occupy memory
- Many useless runnable related to the kille task are still working or wait to be executed
- Currently
checkBlockSendResult
can not be interrupted, when the killed task caused by speculation is the last one of the shuffle map stage, it will block the next reduce stage scheduling
Motivation
No response
Describe the solution
- Cancel all the runnable that are wait to be executed or blocked in waiting for rpc callback
- Interrupt
checkBlockSendResult
immediately
Additional context
No response
Are you willing to submit PR?
- Yes I am willing to submit a PR!
smallzhongfeng
Metadata
Metadata
Assignees
Labels
No labels