-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support autoscaling #3
Comments
/milestone v0.0.1 |
/kind feature |
/milestone clear |
/priority important-longterm |
/milestone v0.2.0 |
/assign If the service controller needs to be integrated with hpa, I am willing to give it a try. Is it related to service.Spec.WorkloadTemplate.Replicas? |
type ElasticConfig struct {
// MinReplicas indicates the minimum number of inference workloads based on the traffic.
// Default to nil means we can scale down the instances to 1.
// If minReplicas set to 0, it requires to install serverless component at first.
// +kubebuilder:default=1
// +optional
MinReplicas *int32 `json:"minReplicas,omitempty"`
// MaxReplicas indicates the maximum number of inference workloads based on the traffic.
// Default to nil means there's no limit for the instance number.
// +optional
MaxReplicas *int32 `json:"maxReplicas,omitempty"`
// Metrics contains the specifications which are used to calculate the
// desired replica count (the maximum replica count across all metrics will
// be used). The desired replica count is calculated with multiplying the
// ratio between the target value and the current value by the current
// number of pods. Ergo, metrics used must decrease as the pod count is
// increased, and vice-versa. See the individual metric source types for
// more information about how each type of metric must respond.
// If not set, the HPA will not be created.
// +optional
Metrics []autoscalingv2.MetricSpec `json:"metrics,omitempty"`
} @kerthcet |
I will revisit this latter, but in my imagination, I just don't want to copy the fields from HPA to ElasticConfig, I hope it can work with various systems, like HPA, keda, so the fields should be abstract sufficiently. |
Indeed,. That is, we only need to abstract the fields. The controller provides a provider-like interface (e.g. HPAProvides) internally. These features are implemented internally. right? |
Some related metrics: |
As the
service.Spec
describes, we haveminReplicas
andmaxReplicas
, what we hope to do is adjust the number based on the traffic, aka. servreless. We can use ray or keda/knative as alternatives, but here we hope we can have a simple implementation, then no need to depend on other libraries.For the first step, let's integrate with HPA for autoscaling capacities.
The text was updated successfully, but these errors were encountered: