-
Notifications
You must be signed in to change notification settings - Fork 4.9k
KIC fails to start. All pods down: nginx [emerg] 1#0: bind() to unix:/kong_prefix/sockets/we failed (98: Address already in use)" #13730
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
FYI: the "deadlock" is removed by restarting the pods manually kubectl -n kong-dbless delete pods --selector=app.kubernetes.io/instance=kong-green then kong KIC pods (controller and gateways) restart normally. |
It seems your issue has been resolved. Feel free to reopen if you have any further concern. |
thanks @StarlightIbuki for taking this issue. However, your answer doesn't help much. Could you please point me to the resolution ? How has this issue been solved ? And what's the fix ? Thank you in advance |
Sorry I thought you had found the solution. @randmonkey Could you also take a look into this? |
hi @randmonkey, we are getting the issue above multiple times per days and it's getting very frustrating. Do you have any insights to share ? On my side, I've also been searching for solutions. And a closer look at the behaviour wise, the liveness probe is failing, which only restarts the container. Restarting the container doesn't help. Kong is able to start only when the pod is deleted (manually), which leads me towards cleaning up the PID |
the issue seems to be the same as Kong/kubernetes-ingress-controller#5324 I have the following hypothesis on what is happening
|
The
5-8 would be the possible reason of the issue. For 1-4, KIC failing to push the config will not make the liveness probe fail and then restart the gateway pod. |
👋 I think I have some insight on this. In 3.8, we relocated Kong's internal sockets into a subdirectory in the prefix tree (#13409). There is some code that runs as part of This logic is unfortunately duplicated in our docker entrypoint script because it circumvents The docker entrypoint code was not updated to point to the new socket directory that Kong is using as of 3.8 (an oversight). I've opened a PR to remedy this, which I think should resolve the issue. For those using the *In fact, enabling this kind of ops pattern in 3.8 was part of the underlying intent of #13409: establishing more segregation between persistent and transient data so that lifecycle management doesn't require non-trivial amounts of scripting (like what is found in the aforementioned docker entrypoint). |
hello @flrgh , ingress:
controller:
# controller config
gateway:
enabled: true
deployment:
kong:
enabled: true
initContainers:
- command:
- rm
- '-vrf'
- ${KONG_PREFIX}/sockets
env:
- name: KONG_PREFIX
value: /kong_prefix/
image: kong:3.8.0
imagePullPolicy: IfNotPresent
name: clear-stale-pid-custom
volumeMounts:
- mountPath: /kong_prefix/
name: kong-green-gateway-prefix-dir when our preemptible node got "restarted" just a few minutes ago. Kong was not able to restart properly and crash again with the errors
|
@joran-fonjallaz that's odd. My k8s knowledge is minimal, so bear with me a little. If
|
hello @flrgh, So your feeling that the issue might be linked to 3.8 does seem correct |
Confirm! Sometimes application pod can't start up because of this error (
Probably its not that easy, in most cases container |
have you tried kong 3.9? There was an update to the entrypoint. |
It should help! I'll try it
чт, 16 янв. 2025 г. в 21:56, Steve Jacobs ***@***.***>:
… have you tried kong 3.9? There was an update to the entrypoint.
Kong/docker-kong#724 <Kong/docker-kong#724>
—
Reply to this email directly, view it on GitHub
<#13730 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABHROZVTMRGQDWP3Z4EFH6D2LAMJDAVCNFSM6AAAAABPPOXR52VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDKOJWHA3TEOBTGA>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
I'm on the last kubernetes dashboard 7.10.3 and they use kong as http dispatcher.
This issue was getting me crazy. I test 3.9. thanks |
We are facing the same issue. We are using latest kubernetes-dashboard which inturn uses kong.
Once I delete the kong pod, all comes back to normalcy. Will appreciate a fix for this. |
@alahane-techtel Did you read my comment? The fix is to set the version to Here is how I do it. |
Hi @gerardnico , from your comments, it looked like that you were about to try, but it didnt mention if it worked. Thanks for clarifying and I'm relieved that it works. I'm hopeful that k8s-dashboard will soon release latest version which uses Kong 3.9, which should fix this issue. We also tried k8s-dashboard with Kong 3.9 and can confirm that the issue doesnt occur anymore.
Have created an issue with k8s-dashboard: |
Do we have any workaround for 3.8.0 version? it not possible to me to update it to 3.9 as of now. |
Hey Do we have any update on this. @gerardnico @brokenjacobs @flrgh @baznikin @randmonkey @KongGuide |
@shashank-altan can you stop bothering us. If you can't find the answer yourself, go find another job. |
@shashank-altan I'm sorry to hear that you are having issues. Unfortunately the kind of support that we can provide on the open source project is limited. In general, we can't provide backports for fixes/features to older versions of the gateway. Even on the Enterprise version of the gateway, backports are rare. Very often our suggestion is for the user to do an update. If you are not capable of updating, there's very little we can do to help you. Please refrain from pinging others repeatedly on issues. Especially if they are community users like yourself. (@flrgh and @randmonkey are Kong employees, the other people that you are pinging are not). |
Is there an existing issue for this?
Kong version (
$ kong version
)3.8.0
Current Behavior
hello,
We run kong KIC on GKE clusters: every night the preemptible nodes are reclaimed in our staging envs. And most of the time, it takes down all kong gateway pods (2 replicas) for hours.
versions
1.30.4-gke.1348000
0.14.1
,3.3.1
3.8.0
Additional info
It seems that the liveness probe is responding ok, while the readiness probe remains unhealthy, leading to the gateway pods to just remain around, not able to process traffic.
Error logs
The controller fails to talk to the gateways with
Kong finds itself in some sort of "deadlock" until the pods are deleted manually. Any insights ?
Below is the
values.yaml
file configuring kongExpected Behavior
kong gateway pods, either
bind() to unix:/kong_prefix/sockets/we failed (98: Address already in use)
Steps To Reproduce
I could reproduce the error by killing the nodes (
kubectl delete nodes
) on which the kong pods were running. After killing the nodes, KIC fails to restart as it enters the deadlock situation described above. See screenshot:Anything else?
dump of a failing gateway pod:
kubectl describe
:and logs
The text was updated successfully, but these errors were encountered: