-
Notifications
You must be signed in to change notification settings - Fork 161
[#2508] feat(spark3): Record failed tasks on any shuffle write/failure into event logs #2509
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…failure into event logs
proto/src/main/proto/Rss.proto
Outdated
@@ -636,6 +638,8 @@ message ReportShuffleReadMetricRequest { | |||
int32 stageId = 2; | |||
int64 taskId = 3; | |||
map<string, ShuffleReadMetric> metrics = 4; | |||
bool isShuffleReadFailed = 6; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could u remove shuffle
and why this is 6 instead of 5?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a boolean or enum?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could u remove
shuffle
and why this is 6 instead of 5?
Mistake.
Is this a boolean or enum?
boolean
proto/src/main/proto/Rss.proto
Outdated
@@ -636,6 +638,8 @@ message ReportShuffleReadMetricRequest { | |||
int32 stageId = 2; | |||
int64 taskId = 3; | |||
map<string, ShuffleReadMetric> metrics = 4; | |||
bool isShuffleReadFailed = 5; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you remove shuffle
from the variable name? Because the message already has the shuffle.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about changing this to isTaskReadFailed
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isReadFailed
may be enough. isTaskReadFailed
is ok if this fail will cause the failure of the task.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isTaskReadFailed is ok if this fail will cause the failure of the task.
It will
Update @jerqi |
…failure into event logs (apache#2509) Record failed tasks on any shuffle write/failure into event logs For apache#2508. Having this PR, we could retrieve spark jobs failure reason from the whole clusters' event logs. No. Neen't
What changes were proposed in this pull request?
Record failed tasks on any shuffle write/failure into event logs
Why are the changes needed?
For #2508. Having this PR, we could retrieve spark jobs failure reason from the whole clusters' event logs.
Does this PR introduce any user-facing change?
No.
How was this patch tested?
Neen't