-
Notifications
You must be signed in to change notification settings - Fork 1.4k
feat(scheduledsparkapplication): configurable timestampPrecision (nan… #2742
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
Hi @rahul810050 , you do not have to create a new PR for every update. You can run git push command by adding an extra |
sure @ChenYi015 i will keep it in mind...thanks! |
aa58c67 to
17c21a9
Compare
ChenYi015
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rahul810050 Thanks for your contribution! I have left some comments.
| // EDIT THIS FILE! THIS IS SCAFFOLDING FOR YOU TO OWN! | ||
| // NOTE: json tags are required. Any new fields you add must have json tags for the fields to be serialized. | ||
|
|
||
| // ScheduledSparkApplicationSpec defines the desired state of ScheduledSparkApplication. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Duplicated.
| // TimestampPrecision controls the precision of the timestamp appended to generated | ||
| // SparkApplication names for scheduled runs. | ||
| // | ||
| // Allowed values: "nanos", "micros", "millis", "seconds", "minutes" | ||
| // +kubebuilder:validation:Enum=nanos;micros;millis;seconds;minutes | ||
| // +optional | ||
| // +kubebuilder:default:=nanos | ||
| // Defaults to "nanos" to preserve current behavior. | ||
| TimestampPrecision string `json:"timestampPrecision,omitempty"` | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This field should be removed as we have dicussed.
| "os" | ||
| "strconv" | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be better to group the imports with the specified order, i.e. stdlib, third-party and self imports.
| @@ -0,0 +1,102 @@ | |||
| #!/usr/bin/env bash | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this file needed? We can run unit tests and e2e tests directly without setting up envtest binaries.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually I am on Arch Linux so I needed this for my machine
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any error when running make unit-test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ChenYi015 thanks for asking!
Yes — when I run make unit-test on my Arch Linux setup, it fails because envtest cannot find the Kubernetes API server binaries (kube-apiserver / etcd). Arch Linux does not ship compatible versions by default, and setup-envtest also fails due to version mismatch.
That’s why I added the helper script — it downloads the exact binaries that controller-runtime expects and places them in the correct directory, allowing the unit tests and e2e tests to run successfully on my machine.
If the project prefers not to include this script, I’m totally fine removing it and relying only on the documented workflow. I mainly added it to help cross-platform contributors like myself.
Please let me know what you prefer and I’ll update the PR accordingly!
| # -- default precision for ScheduledSparkApplication timestamp suffix | ||
| scheduledSparkApplication: | ||
| timestampPrecision: "nanos" | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| # -- default precision for ScheduledSparkApplication timestamp suffix | |
| scheduledSparkApplication: | |
| timestampPrecision: "nanos" | |
| # ScheduledSparkApplication controller configurations. | |
| scheduledSparkApplication: | |
| # -- Default timestamp precision for the scheduled SparkApplication name suffix. | |
| # Can be one of `nanos`, `micros`, `millis`, `seconds` and `minutes`. | |
| timestampPrecision: nanos |
Would be better to move this value after controller.batchScheduler as it is a less common used configuration. Then run make helm-docs to update the Helm chart README.
|
|
||
| // Determine timestamp precision with precedence: | ||
| // 1) controller-wide env var SCHEDULED_SA_TIMESTAMP_PRECISION (if set and non-empty) | ||
| // 2) per-app scheduledApp.Spec.TimestampPrecision (if set) | ||
| // 3) default "nanos" for backward compatibility | ||
| precision := "nanos" // fallback default | ||
|
|
||
| if envPrecision, ok := os.LookupEnv("SCHEDULED_SA_TIMESTAMP_PRECISION"); ok && strings.TrimSpace(envPrecision) != "" { | ||
| precision = strings.TrimSpace(envPrecision) | ||
| } else if strings.TrimSpace(scheduledApp.Spec.TimestampPrecision) != "" { | ||
| precision = strings.TrimSpace(scheduledApp.Spec.TimestampPrecision) | ||
| } | ||
|
|
||
| suffix := formatTimestamp(precision, t) | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can add a new controller flag to be consistent with the current implementation, like --timestamp-precision, instead of adding a new env.
See: https://github.com/kubeflow/spark-operator/blob/f53373e7e94ec6eb69d2f6829e61e9c9497e71d6/cmd/operator/controller/start.go
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
9f27777 to
8e68ded
Compare
|
Some changes still need to be made:
|
|
|
||
| // EDIT THIS FILE! THIS IS SCAFFOLDING FOR YOU TO OWN! | ||
| // NOTE: json tags are required. Any new fields you add must have json tags for the fields to be serialized. | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes to this file should be removed.
| {{- with .Values.controller.env }} | ||
|
|
||
| # --- Controller environment variables (preserve user configured controller.env if any). | ||
| {{- if .Values.controller.env }} | ||
| env: | ||
| {{- toYaml . | nindent 8 }} | ||
| {{ toYaml .Values.controller.env | nindent 8 }} | ||
| {{- end }} | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let us revert it back to avoid unnecessary changes.
| @@ -0,0 +1,26 @@ | |||
| package scheduledsparkapplication | |||
|
|
|||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the formatTimestampLengths util function is defined in controller.go, it is suggested to put the related unit tests in file controller_test.go .
…os|micros|millis|seconds|minutes) Add optional spec.timestampPrecision to configure the precision of the timestamp suffix appended to generated SparkApplication names for scheduled runs. Default remains 'nanos' for backward compatibility. Adds 'minutes' option to match CronJob granularity and keep generated names short. Includes helper function, unit tests and optional chart value. Fixes: kubeflow#2602 Signed-off-by: rahul810050 <[email protected]>
752715e to
7e76e64
Compare
feat(scheduledsparkapplication): add configurable timestampPrecision for run name generation
This PR introduces a new field
.spec.timestampPrecisionthat allows users to control the precision of the timestamp suffix added toSparkApplicationnames generated byScheduledSparkApplication. Supported values are:nanos|micros|millis|seconds|minutesThe default remains
nanosfor full backward compatibility.Summary
Previously, scheduled runs always used
time.UnixNano()to generate the timestamp suffix, producing a 19-digit value. When combined with long application names, this frequently caused the run name to exceed Kubernetes’ 63-character limit.This PR makes the timestamp precision configurable so users can choose a shorter suffix if needed. A new
minutesoption is also added to match Kubernetes CronJob controller behavior, which only schedules in minute granularity.Key Changes
API / CRD
.spec.timestampPrecisiontoScheduledSparkApplicationSpecnanos,micros,millis,seconds,minutes"nanos"(preserves current behavior)make generate+make manifestsController
formatTimestamp()helper to format timestamps according to the selected precisionminutesmode computesUnix()/60to stay consistent with CronJob naming semanticsTests
format_timestamp_test.goto validate timestamp length for all supported precisionsenvtestsetup helper script for contributorsHelm Chart
controller.scheduledSparkApplication.timestampPrecision"nanos"Why This Is Needed (Fixes #2602)
Users with long application names often hit Kubernetes’ 63-character name limit because the operator always appends a 19-digit nanosecond timestamp.
Allowing precision selection reduces the suffix size:
This gives users flexibility while keeping backward compatibility.
How I Tested
Checklist: