kong-controller stops fetching EndpointSlices and update kong-gateways #6567

lindeskar · 2024-10-25T10:55:01Z

Is there an existing issue for this?

I have searched the existing issues

Current Behavior

A few times per day we see the kong-controller enter a state where it stops fetching EndpointSlices, and by that not updating kong-gateways with new configuration. The bad state lasts for about 30 minutes before an unknown trigger makes it all go back to normal.

This affects traffic going through kong-gateways if there were upstream changes during the bad kong-controller state, which the kong-gateways are then not aware of.
The cluster where the issue is occurring is heavily using spot Nodes, which leads to frequent updates of available Pods in Services.

The issue also affects Kong itself if a kong-gateway Pod is replaced during the bad state. Logs show the kong-controller not being aware of the new kong-gateway and still tries to reach the old kong-gateway Pod.

--

During the issue, two errors are constantly logged:

newly added kong-gateway Pods: not ready for proxying: no configuration available (empty configuration present)
the kong-controller: Failed to fill in defaults for plugin with a reference to a previously running kong-gateway Pod, not the newly added one

I think these errors are a symptom of a greater issue where something in the kong-controller gets stuck.

Debug logs show Fetching EndpointSlices and Sending configuration to gateway clients stop entirely during the bad state:

Expected Behavior

The kong-controller keeps fetching EndpointSlices and updates the kong-gateways.

Steps To Reproduce

Note: We have not been able to reproduce the issue in other Kubernetes clusters.

Values for the ingress chart:

controller:
  serviceMonitor:
    enabled: false # see https://github.com/Kong/charts/issues/1053 for more info
  podAnnotations: {} # disable kuma and other sidecar injection
  resources:
    requests:
      cpu: 50m
      memory: 128Mi
  extraObjects:
    - apiVersion: monitoring.coreos.com/v1
      kind: PodMonitor
      metadata:
        labels:
          app.kubernetes.io/component: app
          app.kubernetes.io/instance: kong
          app.kubernetes.io/name: controller
        name: kong-controller
        namespace: kong
      spec:
        podMetricsEndpoints:
          - path: /metrics
            targetPort: cmetrics
        selector:
          matchLabels:
            app.kubernetes.io/component: app
            app.kubernetes.io/instance: kong
            app.kubernetes.io/name: controller
  ingressController:
    customEnv:
      CONTROLLER_LOG_LEVEL: debug

gateway:
  serviceMonitor:
    enabled: true
  replicaCount: 3
  podDisruptionBudget:
    enabled: true
    minAvailable: 1
  resources:
    requests:
      cpu: 10m
      memory: 240Mi
  deployment:
    prefixDir:
      sizeLimit: 2Gi
  proxy:
    externalTrafficPolicy: Local

  env:
    # HSTS, we use same values as in the default in ingress-nginx
    nginx_http_add_header: 'Strict-Transport-Security "max-age=15724800; includeSubDomains" always'

    # The client body memory buffer size: https://github.com/kubernetes/ingress-nginx/blob/main/docs/user-guide/nginx-configuration/annotations.md#client-body-buffer-size
    nginx_http_client_body_buffer_size: 50m

    # Don't pass on 'Server' header to downstream
    nginx_http_more_clear_headers: Server

    # Disable Kong headers to downstream
    headers: "off"

    # Enable Gzip compression, if requested by the client. Gzip types are infulenced by the default in ingress-nginx (minus xml types)
    nginx_http_gzip: "on"
    nginx_http_gzip_types: "application/javascript application/x-javascript application/json application/vnd.ms-fontobject application/x-font-ttf application/x-web-app-manifest+json font/opentype text/css text/javascript text/plain text/html application/octet-stream"
    nginx_http_gzip_min_length: "500"
    nginx_http_gzip_comp_level: "6"
    nginx_http_gzip_http_version: "1.1"
    nginx_http_gzip_proxied: "any"
    nginx_http_gzip_vary: "on"

Kong Ingress Controller version

kong/kubernetes-ingress-controller:3.3 from the Helm chart (the digest matches 3.3.1)

Kubernetes version

v1.29.9-gke.1177000

Anything else?

Debug log filtered for kong-gateway Pod IPs:
kong-controller-debug-2.txt

172.19.7.208 kong-gateway running
172.19.0.152 kong-gateway running
172.19.2.48 kong-gateway stopped 14:56
172.19.1.164 kong-gateway started 14:56 and stopped 17:10
172.19.0.154 kong-gateway started 17:10

The text was updated successfully, but these errors were encountered:

lindeskar added the bug Something isn't working label Oct 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kong-controller stops fetching EndpointSlices and update kong-gateways #6567

kong-controller stops fetching EndpointSlices and update kong-gateways #6567

lindeskar commented Oct 25, 2024 •

edited

Loading

kong-controller stops fetching EndpointSlices and update kong-gateways #6567

kong-controller stops fetching EndpointSlices and update kong-gateways #6567

Comments

lindeskar commented Oct 25, 2024 • edited Loading

Is there an existing issue for this?

Current Behavior

Expected Behavior

Steps To Reproduce

Kong Ingress Controller version

Kubernetes version

Anything else?

lindeskar commented Oct 25, 2024 •

edited

Loading