feat(spark): Populate time variables for log links (#6328) #6411
+47
−14
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This pull request modifies the getEventInfoForSpark function within flyteplugins/go/tasks/plugins/k8s/spark/spark.go.
Extract Timestamps: Retrieves the SubmissionTime, CompletionTime, and TerminationTime from the sparkOp.SparkApplicationStatus (sj.Status).
Format Timestamps: The retrieved metav1.Time values are formatted into RFC3339 strings and standard Unix timestamps (int64). SubmissionTime is used for the start time, and CompletionTime (or TerminationTime as a fallback) is used for the end time. Checks are included to handle cases where these timestamps might be zero (e.g., job hasn't started/finished).
Populate tasklog.Input: The formatted timestamps are used to populate the PodRFC3339StartTime, PodRFC3339FinishTime, PodUnixStartTime, and PodUnixFinishTime fields within the tasklog.Input struct.
Targeted Application: This population is specifically done for the calls to p.GetTaskLogs that fetch logs associated with the Spark driver pod (sj.Status.DriverInfo.PodName), namely the "Driver Logs" (using the "Mixed" log config) and "User Logs" (using the "User" log config). Calls for "System" and "AllUser" logs remain unchanged as they use the application name rather than a specific pod name.
Summary by Bito
This PR enhances Spark logging by extracting time metrics from SparkApplicationStatus, computing start/finish timestamps based on submission and completion times. The timestamps are formatted in RFC3339 and Unix formats to improve the tasklog.Input structure for both driver and user logs, enabling more accurate log tracking and troubleshooting.Unit tests added: False
Estimated effort to review (1-5, lower is better): 1