Skip to content

Honour HPA Scale-Down Policy When Scaling Down to Zero Replicas #7204

@subbu2k

Description

@subbu2k

Proposal

Current KEDA behaviour during scale to zero is such that if the scale trigger remains inactive after the cooldownPeriod, KEDA will force the deployment to scale down to 0, regardless of the current number of replicas.

The proposal is to honour the scale-down policy defined in the Horizontal Pod Autoscaler (HPA) before enforcing the scale to zero. If the user configures the following in the ScaledObject:

behavior:
  scaleDown:
    policies:
    - type: Pods
      value: 2
      periodSeconds: 60

then, KEDA should first gradually scale down the replicas from N to 1 in steps of 2, and only then scale from 1 to 0.

Use-Case

After the trigger becomes inactive for the cooldownPeriod, KEDA's current behaviour forces the deployment to scale down to zero immediately. If there is a short burst of traffic right after the cooldownPeriod, the deployment will need to scale up from 0 to 1, increasing request latency.

If KEDA honours the HPA scale-down policy, replicas will scale down gradually in defined steps, ensuring that even during short traffic bursts, there are still pods available to serve requests, reducing latency and improving response times.

Is this a feature you are interested in implementing yourself?

No

Anything else?

No response

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

Status

To Triage

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions