Skip to content

kube-startup-cpu-boost becomes ready too fast #95

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
camaeel opened this issue Mar 18, 2025 · 1 comment
Closed

kube-startup-cpu-boost becomes ready too fast #95

camaeel opened this issue Mar 18, 2025 · 1 comment

Comments

@camaeel
Copy link

camaeel commented Mar 18, 2025

Community Note

  • Please vote on this issue by adding a 👍 reaction
    to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do
    not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment.

Debug Output

Panic Output

Steps to Reproduce

  1. Create node with limited resources (for example 1600m cores)
  2. set mutatingWebhook with failurePolicy=Fail (to be sure webhook responds successfully)
  3. Create boost
    apiVersion: autoscaling.x-k8s.io/v1alpha1
    kind: StartupCPUBoost
    metadata:
      name: boost-cpu
      namespace: boost-test
    selector:
    #  matchLabels: {}
    #  matchExpressions:
    #    - key: app
    #      operator: In
    #      values: [ "boost-test" ]
      matchLabels:
        app: boost-test
    spec:
      resourcePolicy:
        containerPolicies:
          - containerName: nginx
            fixedResources:
              requests: "900m"
      durationPolicy:
        podCondition:
          type: Ready
          status: "True"
    
  4. Create Deployment:
    apiVersion: apps/v1
    kind: Deployment
    metadata:
      labels:
        app: boost-test
      name: boost-test
      namespace: boost-test
    spec:
      replicas: 4
      selector:
        matchLabels:
          app: boost-test
      template:
        metadata:
          labels:
            app: boost-test
        spec:
          affinity:
            podAffinity:
              requiredDuringSchedulingIgnoredDuringExecution:
                - labelSelector:
                    matchExpressions:
                      - key: app
                        operator: In
                        values:
                          - boost-test
                  topologyKey: kubernetes.io/hostname
          containers:
            - image: nginx
              name: nginx
              resources:
                requests:
                  cpu: 100m
                  memory: 20Mi
                limits:
                  memory: 100Mi
              livenessProbe:
                httpGet:
                  port: 80
                  path: /index.html
                initialDelaySeconds: 20
                periodSeconds: 5
                failureThreshold: 3
                successThreshold: 1
                timeoutSeconds: 2
              readinessProbe:
                httpGet:
                  port: 80
                  path: /index.html
                initialDelaySeconds: 60
                periodSeconds: 5
                failureThreshold: 3
                successThreshold: 1
                timeoutSeconds: 2
    
  5. Now restart controller, and just after it becomes ready, delete all pods of the test deployment.

Expected Behavior

CPU request should be increased for all pods and only one should be running in parallel.

Actual Behavior

When new pod is created quickly after operator pod becomes ready it often happens that CPUBoost is ignored. IMHO this is cause by Readiness probe that responds too early, and in the mean time operator needs some time to fill up the caches/informer with data.
It should wait with serving webhook requests (/readyz endpoint) until all CPUBoosts are loaded.

References

  • #0000
@mikouaj
Copy link
Member

mikouaj commented Apr 14, 2025

I've made some improvements in this area that are available in v0.15.0.

@mikouaj mikouaj closed this as completed Apr 15, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants