Skip to content

added continuous retries to make sure deployment is scaled up #20

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Oct 3, 2024

Conversation

shubhamrai1993
Copy link
Contributor

@shubhamrai1993 shubhamrai1993 commented Oct 2, 2024

With a keda based autoscaling setup.

Two loops run continuously -

  • Prometheus recording every ~ 15s
    • pending pod status
    • traffic
  • Keda scaling based on prometheus having the pending pod or traffic data - every 30s

Strategies

  • Synchronise keda scaling down and elasti scaling up
    • How?
      • Keda must know that a service is in the middle of scale up by elasti, hence, keda will need to listen to a metric that is emitted from elasti. This would mean, elasti will have to implement a scaler through webhook which keda can listen to
    • Pros
      • We prevent scaling down from happening at all
    • Cons
      • Elasti becomes keda aware which is better avoided
  • Add retries to elasti to keep scaling up until the request is fulfilled
    • How?
      • Add retry for elasti to keep scaling up the deployment every 5 seconds
    • Pros
      • Elasti and autoscaler remain decoupled
    • Cons
      • Initial scale down will happen by keda. Although the scale up happens as well by elasti

Remaining problem -

  • Even after all this there are failure cases -
    • If prometheus is delays the recording of metrics for some reason, keda can still consider a service for scale down when the first request has been fulfilled by elasti, after which no retries happen

@@ -123,15 +123,18 @@ func (h *Handler) handleAnyRequest(w http.ResponseWriter, req *http.Request) (*m
// Send request to throttler

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The above line looks incorrect, as there is no error handling at all.
Do we even need go here?

@shubhamrai1993 shubhamrai1993 merged commit 217acd4 into main Oct 3, 2024
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants