-
Notifications
You must be signed in to change notification settings - Fork 6.8k
[Serve][2/N] Add deployment-level autoscaling snapshot and event summarizer #56225
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Signed-off-by: Dongjun Na <[email protected]>
Signed-off-by: Dongjun Na <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces valuable observability features for autoscaling in Ray Serve by adding structured JSON logs for autoscaling snapshots. The implementation is solid, with a new ServeEventSummarizer
to handle log formatting and throttling, and new methods in AutoscalingState
to provide the necessary data.
My review includes a few suggestions for improvement:
- A high-severity issue where a hardcoded policy name is used in
ScalingDecision
objects, which should be corrected to use the dynamically determined policy name. - A medium-severity issue in the logging utility where missing timestamps are replaced with the current time, which could be misleading.
- A medium-severity suggestion to refactor duplicated logic for accessing configuration values to improve code maintainability.
Signed-off-by: Dongjun Na <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
my main feedback about this PR is that we are creating many intermediate free form dictionaries, and it is not clear to me why we need them all but importantly they create ambiguity in future about what each dictionary is supposed to contain making maintaining code harder. The code can be reorganized better, used typed objects to function that need to return large dictionaries.
…except and unused func(note_once_per_interval) Signed-off-by: Dongjun Na <[email protected]>
…er, and add constant Signed-off-by: Dongjun Na <[email protected]>
…remove unnecessary getattr Signed-off-by: Dongjun Na <[email protected]>
Signed-off-by: Dongjun Na <[email protected]>
Signed-off-by: Dongjun Na <[email protected]>
Signed-off-by: Dongjun Na <[email protected]>
Signed-off-by: Dongjun Na <[email protected]>
Signed-off-by: Dongjun Na <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the contribution @nadongjun! Have you thought about how this would change/work with application-level autoscaling which is in flight: #56149? When application-level autoscaling is enabled, deployment does not autoscale by-itself, so that may change how user should interpret the logs.
As a feedback for the PR, I would recommend packaging the various autoscaling relevant values into objects, and pass that object around. It's somewhat difficult to track all the different variables and where they come from, and makes the code a bit harder to parse.
- Rename get_observability_snapshot → get_snapshot for clarity - Rename proposed_replicas → target_replicas across snapshot flow - Return last_metrics_age_s=None when no metrics; map to "unknown" in summarizer - Flatten replicas_allowed{min,max} into top-level min, max in snapshot payload - Move look_back_period_s to top-level for consistency - Rename DecisionSummary → AutoscalingDecisionSummary for clarity - Replace tuple-based SnapshotSignature with typed dataclass - Use DeploymentID directly as dedupe key instead of (app_name, dep_name) - Inline snapshot computation in controller; remove _compute_snapshot_inputs - Push scaling_status formatting into log_deployment_snapshot - Update tests to validate new payload shape (min/max, no replicas_allowed) Signed-off-by: Dongjun Na <[email protected]>
- Standardize payload to return 'timestamp_s' for snapshots. - Return metrics health as last_metrics_age_s Signed-off-by: Dongjun Na <[email protected]>
Signed-off-by: Dongjun Na <[email protected]>
@abrarsheikh @akyang-anyscale Thanks for the detailed review! @akyang-anyscale That’s a fair point. serve_autoscaling_snapshot log format currently only covers deployment-level autoscaling. Once application-level autoscaling is added, we’ll log deployment and application-level snapshots separately. I’ve already switched to typed dataclasses (e.g., DeploymentSnapshot, AutoscalingDecisionSummary) so the controller passes structured objects instead of dicts. I’ll do the same for application-level autoscaling to keep things consistent. |
Signed-off-by: Dongjun Na <[email protected]>
Signed-off-by: Dongjun Na <[email protected]>
|
||
return total_requests | ||
|
||
def get_deployment_snapshot(self, curr_target_num_replicas: int) -> Dict[str, Any]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
get_deployment_snapshot
is a expensive operation to be performed on every control loop iteration, reason because that it calls get_total_num_requests
, loops over replicas and handle. These are expensive operations for a large cluster. Second, it calls self.get_decision_num_replicas
which internally executed autoscaling policy which was be expensive.
I suggest instead constructing the DeploymentAutoscalingSnapshot object every time get_decision_num_replicas
run and storing that on the class object. Then get_deployment_snapshot
simply return the cached DeploymentAutoscalingSnapshot object.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good call, I’ve applied this. Now the snapshot is constructed once during get_decision_num_replicas() and cached, and get_deployment_snapshot() just returns the cached object.
Signed-off-by: Dongjun Na <[email protected]>
Signed-off-by: Dongjun Na <[email protected]>
Signed-off-by: Dongjun Na <[email protected]>
Signed-off-by: Dongjun Na <[email protected]>
Signed-off-by: Dongjun Na <[email protected]>
Signed-off-by: Dongjun Na <[email protected]>
Signed-off-by: Dongjun Na <[email protected]>
and self.app == other.app | ||
and self.deployment == other.deployment | ||
) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug: Timestamp Comparison Fails Snapshot Deduplication
The is_scaling_equivalent
method, meant for autoscaling snapshot log deduplication, compares timestamp_str
, app
, and deployment
. Since timestamp_str
always differs between snapshots, this logic prevents effective deduplication, causing the controller to log every snapshot even when the scaling state is unchanged. Comparing actual scaling-related fields would be more effective.
self.timestamp_str == other.timestamp_str | ||
and self.app == other.app | ||
and self.deployment == other.deployment | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug: Autoscaling Snapshot Deduplication Fails
The is_scaling_equivalent
method's logic for autoscaling snapshot deduplication is flawed. It compares timestamp_str
, which is unique for each snapshot, making deduplication ineffective. This results in logging every snapshot even when the actual scaling state remains unchanged.
then validates the JSON payload shape and a few key fields. This test validates only the earliest snapshot. | ||
""" | ||
|
||
DEPLOY_NAME = f"snap_app_{int(time.time())}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
its desirable to have to test with actual autoscaling behavior so that we assert some real values. The 0
traffic case is not interesting
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed the fixed sleeps and tuned the autoscaling config so it scales quickly. The test now sends load to trigger 1 -> 2 replicas, making it deterministic and less flaky.
Signed-off-by: Dongjun Na <[email protected]>
Signed-off-by: Dongjun Na <[email protected]>
Signed-off-by: Dongjun Na <[email protected]>
@staticmethod | ||
def format_metrics_health_text( | ||
*, | ||
time_since_last_collected_metrics_s: Optional[float], | ||
look_back_period_s: Optional[float], | ||
) -> str: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nadongjun Why are we passing the look_back_period_s
parameter here? Is there some use case for this parameter in this function?
self._autoscaling_logger.info( | ||
"", extra={"type": "deployment", "snapshot": payload} | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we have extra's here. What is type and why do we need it. also why not do self._autoscaling_logger.info(payload)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The type
field is included to help CLI tools easily distinguish between snapshot types such as application
, deployment
, and external
in the autoscaling_snapshot_*.log
files.
The extra
parameter is used to ensure the logs are emitted in structured JSON format, using self._autoscaling_logger.info(payload)
alone would output a plain string instead of structured data.
DecisionRecord( | ||
timestamp_str=time.strftime("%Y-%m-%dT%H:%M:%SZ", time.gmtime()), | ||
current_num_replicas=ctx.current_num_replicas, | ||
target_num_replicas=decision_num_replicas, | ||
reason=f"current={ctx.current_num_replicas}, target={decision_num_replicas}", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason for the decision here is currently written as the current_num_replicas
and target_num_replicas
. Shouldn't this reason be policy specific as to why this decision was made?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that’s right. The current version is a simple implementation that only logs the current and target replica counts to indicate when autoscaling is triggered. We plan to extend it later to include policy-specific reasons for external and custom policies.
policy_name=ctx.config.policy.name, | ||
look_back_period_s=look_back_period_s, | ||
queued_requests=float(queued_requests), | ||
ongoing_requests=float(ctx.total_num_requests), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are the metrics displayed policy agnostic?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the displayed metrics are policy-agnostic.
While fields like policy_name
or look_back_period_s
can differ depending on the active policy, they’re only contextual information. The metrics such as queued_requests
and ongoing_requests
are runtime values gathered from handles and replicas, independent of any specific policy logic.
Signed-off-by: Dongjun Na <[email protected]>
Signed-off-by: Dongjun Na <[email protected]>
Signed-off-by: Dongjun Na <[email protected]>
Signed-off-by: Dongjun Na <[email protected]>
Signed-off-by: Dongjun Na <[email protected]>
Why are these changes needed?
This PR introduces deployment-level autoscaling observability in Serve. The controller now emits a single, structured JSON log line (serve_autoscaling_snapshot) per autoscaling-enabled deployment each control-loop tick.
This avoids recomputation in the controller call sites and provides a stable, machine-parsable surface for tooling and debugging.
Changed
Example log (single line):
Logs can be found in controller log files,
e.g. /tmp/ray/session_2025-09-03_21-12-01_095657_13385/logs/serve/controller_13474.log
.Follow-ups
serve status -v
and CLI/SDK surfaces.Related issue number
#55834
Checks
git commit -s
) in this PR.scripts/format.sh
to lint the changes in this PR.method in Tune, I've added it in
doc/source/tune/api/
under thecorresponding
.rst
file.