Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kong-controller stops fetching EndpointSlices and update kong-gateways #6567

Open
1 task done
lindeskar opened this issue Oct 25, 2024 · 0 comments
Open
1 task done
Labels
bug Something isn't working

Comments

@lindeskar
Copy link

lindeskar commented Oct 25, 2024

Is there an existing issue for this?

  • I have searched the existing issues

Current Behavior

A few times per day we see the kong-controller enter a state where it stops fetching EndpointSlices, and by that not updating kong-gateways with new configuration. The bad state lasts for about 30 minutes before an unknown trigger makes it all go back to normal.

This affects traffic going through kong-gateways if there were upstream changes during the bad kong-controller state, which the kong-gateways are then not aware of.
The cluster where the issue is occurring is heavily using spot Nodes, which leads to frequent updates of available Pods in Services.

The issue also affects Kong itself if a kong-gateway Pod is replaced during the bad state. Logs show the kong-controller not being aware of the new kong-gateway and still tries to reach the old kong-gateway Pod.

--

During the issue, two errors are constantly logged:

  • newly added kong-gateway Pods: not ready for proxying: no configuration available (empty configuration present)
  • the kong-controller: Failed to fill in defaults for plugin with a reference to a previously running kong-gateway Pod, not the newly added one

I think these errors are a symptom of a greater issue where something in the kong-controller gets stuck.

Debug logs show Fetching EndpointSlices and Sending configuration to gateway clients stop entirely during the bad state:
image
image

Expected Behavior

The kong-controller keeps fetching EndpointSlices and updates the kong-gateways.

Steps To Reproduce

Note: We have not been able to reproduce the issue in other Kubernetes clusters.

Values for the ingress chart:

controller:
  serviceMonitor:
    enabled: false # see https://github.com/Kong/charts/issues/1053 for more info
  podAnnotations: {} # disable kuma and other sidecar injection
  resources:
    requests:
      cpu: 50m
      memory: 128Mi
  extraObjects:
    - apiVersion: monitoring.coreos.com/v1
      kind: PodMonitor
      metadata:
        labels:
          app.kubernetes.io/component: app
          app.kubernetes.io/instance: kong
          app.kubernetes.io/name: controller
        name: kong-controller
        namespace: kong
      spec:
        podMetricsEndpoints:
          - path: /metrics
            targetPort: cmetrics
        selector:
          matchLabels:
            app.kubernetes.io/component: app
            app.kubernetes.io/instance: kong
            app.kubernetes.io/name: controller
  ingressController:
    customEnv:
      CONTROLLER_LOG_LEVEL: debug

gateway:
  serviceMonitor:
    enabled: true
  replicaCount: 3
  podDisruptionBudget:
    enabled: true
    minAvailable: 1
  resources:
    requests:
      cpu: 10m
      memory: 240Mi
  deployment:
    prefixDir:
      sizeLimit: 2Gi
  proxy:
    externalTrafficPolicy: Local

  env:
    # HSTS, we use same values as in the default in ingress-nginx
    nginx_http_add_header: 'Strict-Transport-Security "max-age=15724800; includeSubDomains" always'

    # The client body memory buffer size: https://github.com/kubernetes/ingress-nginx/blob/main/docs/user-guide/nginx-configuration/annotations.md#client-body-buffer-size
    nginx_http_client_body_buffer_size: 50m

    # Don't pass on 'Server' header to downstream
    nginx_http_more_clear_headers: Server

    # Disable Kong headers to downstream
    headers: "off"

    # Enable Gzip compression, if requested by the client. Gzip types are infulenced by the default in ingress-nginx (minus xml types)
    nginx_http_gzip: "on"
    nginx_http_gzip_types: "application/javascript application/x-javascript application/json application/vnd.ms-fontobject application/x-font-ttf application/x-web-app-manifest+json font/opentype text/css text/javascript text/plain text/html application/octet-stream"
    nginx_http_gzip_min_length: "500"
    nginx_http_gzip_comp_level: "6"
    nginx_http_gzip_http_version: "1.1"
    nginx_http_gzip_proxied: "any"
    nginx_http_gzip_vary: "on"

Kong Ingress Controller version

kong/kubernetes-ingress-controller:3.3 from the Helm chart (the digest matches 3.3.1)

Kubernetes version

v1.29.9-gke.1177000

Anything else?

Debug log filtered for kong-gateway Pod IPs:
kong-controller-debug-2.txt

  • 172.19.7.208 kong-gateway running
  • 172.19.0.152 kong-gateway running
  • 172.19.2.48 kong-gateway stopped 14:56
  • 172.19.1.164 kong-gateway started 14:56 and stopped 17:10
  • 172.19.0.154 kong-gateway started 17:10
@lindeskar lindeskar added the bug Something isn't working label Oct 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant