You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
A few times per day we see the kong-controller enter a state where it stops fetching EndpointSlices, and by that not updating kong-gateways with new configuration. The bad state lasts for about 30 minutes before an unknown trigger makes it all go back to normal.
This affects traffic going through kong-gateways if there were upstream changes during the bad kong-controller state, which the kong-gateways are then not aware of.
The cluster where the issue is occurring is heavily using spot Nodes, which leads to frequent updates of available Pods in Services.
The issue also affects Kong itself if a kong-gateway Pod is replaced during the bad state. Logs show the kong-controller not being aware of the new kong-gateway and still tries to reach the old kong-gateway Pod.
--
During the issue, two errors are constantly logged:
newly added kong-gateway Pods: not ready for proxying: no configuration available (empty configuration present)
the kong-controller: Failed to fill in defaults for plugin with a reference to a previously running kong-gateway Pod, not the newly added one
I think these errors are a symptom of a greater issue where something in the kong-controller gets stuck.
Debug logs show Fetching EndpointSlices and Sending configuration to gateway clients stop entirely during the bad state:
Expected Behavior
The kong-controller keeps fetching EndpointSlices and updates the kong-gateways.
Steps To Reproduce
Note: We have not been able to reproduce the issue in other Kubernetes clusters.
Is there an existing issue for this?
Current Behavior
A few times per day we see the kong-controller enter a state where it stops fetching EndpointSlices, and by that not updating kong-gateways with new configuration. The bad state lasts for about 30 minutes before an unknown trigger makes it all go back to normal.
This affects traffic going through kong-gateways if there were upstream changes during the bad kong-controller state, which the kong-gateways are then not aware of.
The cluster where the issue is occurring is heavily using spot Nodes, which leads to frequent updates of available Pods in Services.
The issue also affects Kong itself if a kong-gateway Pod is replaced during the bad state. Logs show the kong-controller not being aware of the new kong-gateway and still tries to reach the old kong-gateway Pod.
--
During the issue, two errors are constantly logged:
not ready for proxying: no configuration available (empty configuration present)
Failed to fill in defaults for plugin
with a reference to a previously running kong-gateway Pod, not the newly added oneI think these errors are a symptom of a greater issue where something in the kong-controller gets stuck.
Debug logs show
Fetching EndpointSlices
andSending configuration to gateway clients
stop entirely during the bad state:Expected Behavior
The kong-controller keeps fetching EndpointSlices and updates the kong-gateways.
Steps To Reproduce
Values for the ingress chart:
Kong Ingress Controller version
kong/kubernetes-ingress-controller:3.3
from the Helm chart (the digest matches 3.3.1)Kubernetes version
Anything else?
Debug log filtered for kong-gateway Pod IPs:
kong-controller-debug-2.txt
172.19.7.208
kong-gateway running172.19.0.152
kong-gateway running172.19.2.48
kong-gateway stopped 14:56172.19.1.164
kong-gateway started 14:56 and stopped 17:10172.19.0.154
kong-gateway started 17:10The text was updated successfully, but these errors were encountered: