-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Weird 3.7.0 startup error logs, this time cloned my larger stage db into dev env #13250
Comments
Maybe this was plays a role?
I noticed default is 256 w Kong. Not sure why I had it so low but just bumped that to 256 to match default. |
Looks like that solves your issue. The log says that DP fails to connect to CP and what you found could be the cause. |
Please feel free to reopen the ticket if you have further concerns. |
Not confirmed yet but will let yall know. testing out that change now. I run traditional mode here so no DP/CP setup at all. |
@StarlightIbuki nope it still an issue even bumping that to 256:
Happy to show any kong engineers the issue on a call if they were curious, it must be somewhat related to that new events server block yall do w the |
Has to be some kinda resource scale issue or warmup problem in my env paired with newer kong version causing the shakey startup. Do yall have any OSS or enterprise customers running 3.x with 11,000+ proxies etc. ? Am thinking they may see the same thing. |
@StarlightIbuki can you re-open this? Vs me making a brand new issue that will just clone the deets I have given here. Edit - Remade it here since was not reopened - #13274 |
Other ideas I may have that might cause Kong to do this would be what happens if an entity tries to PUT against the admin API while Kong is still trying to startup or things like that may be impacting that output? I have a healthcheck thing that I use that tries to constantly PUTs a dummy unused upstreams resource to help me check for correct db write access. Its the only resource in that table too we don't use upstreams regularly. |
That's strange. The log states that |
I don't override the default conf file at all but we do use ENV variables to dictate our runtime as well as a custom NGINX template. Current ENV Variables in my K8S Deployment(XXXX's for block outs):
|
@StarlightIbuki And current nginx template file:
|
I don't think its ENV variable or template specific though because I can point at a smaller dev db and none of the error logs come up but as soon as I point to the larger DB(just changing postgres hosts and tables to startup and read from) with more resources in it is when the errors show up. I am thinking to prove it out more by adding a ton of dummy data to my dev db environment and see if when i bloat that pg instance with tons of routes/services/plugins/consumer resources that it then mirrors the errors in startup output similar to what happens when pointing at our much larger stage postgres database instance. 50 proxies vs 11,000 proxies. If I can reproduce it w injecting a ton of dummy data maybe I can drop a posgres dump sql file yall can import to see it behave so in yalls testings too when you import all that data into a sandbox/lab test. |
@jeremyjpj0916 OK I just figured out that |
@StarlightIbuki @chronolaw To give more context on available resources I have k8s pods starting up with as much as 4 cpus and 10Gi(nginx I set it to spin up 4 nginx worker proce:
Which is no small potatoes in terms of compute or memory. Size of Dev DB(no errors on Kong startup):
Size of Stage DB(errors on Kong startup, 82 MB is still fairly small DB size IMO):
|
schema_meta looks correct too for the stage db(all elements are up to 360 for latest kong version):
|
Plugins takes the cake for largest table expectedly:
Maybe next step I will take the |
Recently we upgraded lua-resty-events library to 0.3.0 (https://github.com/Kong/lua-resty-events/releases/tag/0.3.0), but it is not included in 3.7.x now. This version fixed some potential bugs, could you try it with the master branch? thanks. |
@chronolaw is the latest on luarocks(can install the latest version after the kong build w an extra luarocks install line) or can I force a sed command during kong bazel build steps to pull the newer version of it and should work ontop of 3.7.1 without further changes needed out of the box? |
I think that this issue has no relationship with db or memory size, it seems that the events broker ( lua-resty-events) is not ready when events worker trying to connect, |
@chronolaw looks like the source ends up in:
Path for a traditional Kong bazel build image I have. I can just add to the image to dump that codes tag https://github.com/Kong/lua-resty-events/tree/0.3.0 and overwrite the files in that path before Kong startup and report back to yah my startup logs. Edit - I put all the events files into that lua-resty-events patches folder and will copy it over into the events dir after cleaning it:
|
Patch files added to dev w the smaller db, initial runtime looks good here:
Now let me try it against the bigger database with more resources where I see the problem in the first place. |
Same issues persist even with the 0.3.0 patch files dropped in with the bigger DB(stage database) and resources:
After those error logs things start to seem to run normal. |
Also if yall want me to hack any debug statements into the code to get a better understanding of whats happening I can. Just gotta give me the lines you may be curious about where to drop them in. Am also curious if the consensus opinion is that this is mostly harmless for it to be doing this on startup and I am okay to take these errors into a production environment or if I should hold off. Normally when I see [errors] and miscommunications early on in something that would be an event distribution library for managing all kongs intra worker comms I would think thats a big issue, but if its just an early timing thing and kong is all heathy right after those logs are done to not spit it out anymore then I suppose no issue taking it to prod if its just a minor startup comms issue of things not being ready when called etc. Would like yalls opinion there tho. Few other ideas I may have to see if helps stop the error logs:
... any other ideas ill add here to this list. |
Very detailed log data, thank you @jeremyjpj0916 , we will check these data, please give us more time. |
@chronolaw thanks for staying on top of this error that preventing us from moving to newest Kong thus far. Do you have any opinion on if this is something safe to upgrade to and allow to happen on startup until its later fixed? Having been running latest kong in our Stage environment for a week+ now I am not hearing any other reports of issues from it, but I never like error logs popping up in my runtimes anyways but if yall agree these errors are pretty harmless since they go away after startup then I can proceed with my upgrades. |
Yes, I think that these error log message should only happen in start stage, the root reason is the special pub/sub mechanism of events (see: https://konghq.com/blog/engineering/nginx-openresty-event-handling-strategy-for-cpu-efficiency), the broker is not ready but other worker are trying to connect it. Once the broker is ready and workers connect to the broker successfully, the whole events system will work well to serve kong gateway. We will still trying to enhance the robustness to reduce these noise. |
Sounds good then to proceed and just ignore the errors if they start early but stop right after kong startup. Will continue with upgrading our Kong instances then in production too. Thx. Hopefully yall figure out a way to make the workers patiently wait on the broker or broker start before the workers attempt to connection so orderings are all right then these error print logs won't happen. |
Ticket KAG-4867 was created for tracking this issue. |
Good stuff, hopefully yall can reproduce it. |
getting the same error using db less yaml config on docker image kong:3.7.1:
|
Hi Guys, Getting similar logs when running kong using docker.
This is my docker-compose.yml file code: services:
kong-database:
image: postgres:latest
networks:
- kong-net
environment:
POSTGRES_USER: ${POSTGRES_USER}
POSTGRES_DB: ${POSTGRES_DB}
POSTGRES_PASSWORD: ${KONG_PG_PASSWORD}
restart: on-failure
healthcheck:
test: ["CMD", "pg_isready", "-U", "kong"]
interval: 10s
timeout: 5s
retries: 5
volumes:
- kong_db_data:/var/lib/postgresql/data
kong-migrations:
image: j2m_kong_gateway_service:latest
networks:
- kong-net
depends_on:
kong-database:
condition: service_healthy
environment:
KONG_DATABASE: postgres
KONG_PG_HOST: kong-database
KONG_PG_PASSWORD: ${KONG_PG_PASSWORD}
command: "kong migrations bootstrap"
restart: on-failure
kong-gateway:
image: j2m_kong_gateway_service:latest
networks:
- kong-net
depends_on:
kong-database:
condition: service_healthy
kong-migrations:
condition: service_completed_successfully
environment:
KONG_DATABASE: postgres
KONG_PG_HOST: kong-database
KONG_PG_USER: kong
KONG_PG_PASSWORD: ${KONG_PG_PASSWORD}
KONG_PROXY_ACCESS_LOG: /dev/stdout
KONG_ADMIN_ACCESS_LOG: /dev/stdout
KONG_PROXY_ERROR_LOG: /dev/stderr
KONG_ADMIN_ERROR_LOG: /dev/stderr
KONG_ADMIN_LISTEN: "127.0.0.1:8001, 127.0.0.1:8443 ssl"
KONG_ADMIN_GUI_URL: ${KONG_ADMIN_GUI_URL}
ports:
- "127.0.0.1:8000:8000"
- "127.0.0.1:8443:8443"
- "127.0.0.1:8001:8001"
- "127.0.0.1:8002:8002"
- "127.0.0.1:8444:8444"
- "127.0.0.1:8445:8445"
- "127.0.0.1:8003:8003"
- "127.0.0.1:8004:8004"
restart: on-failure
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8001/services"]
interval: 30s
timeout: 10s
retries: 5
command: ["kong", "start"]
networks:
kong-net:
name: kong-net
volumes:
kong_db_data:
Any solutions? |
I don't think thats my problem ^ , I already run really large pods and still see the error on startup:
And I already set it here too for above example env to ensure right process count:
Interesting that you managed to reproduce it in a scenario where Kong is starved from resources it needs to run proper though, somewhat telling. |
Is there an existing issue for this?
Kong version (
$ kong version
)3.7.0
Current Behavior
When Kong 3.7.0 starts up post migration on a migration up and finish db from 2.8.x I see this in stdout logs. The [error] logs only occur on startup though. Then Kong afterward passes all my functional test suite tests and the admin-api calls are behaving as expected. The same exact container image of Kong for us and nginx template etc. using dev database(40 proxies) with way fewer resources in it does not cause Kong to startup with any [error] logs so maybe its something about the size of the stage kong gateway with startup resources(11,0000 proxies) compared to dev.
Logs are here:
Note: was gonna grab debug logs for yall but wayyyyy too much crazy spew, my little stdout view window could not grab it. In INFO mode was able to at least grab it though.
Edit - also rolled back my dev env to point back at the smaller dev env database and then all those error logs go away, startup with smaller dev db output looks like this(did not touch any other env variable or nginx template stuff between the cutover):
Expected Behavior
No error logs on startup.
Steps To Reproduce
No response
Anything else?
No response
The text was updated successfully, but these errors were encountered: