-
Notifications
You must be signed in to change notification settings - Fork 7
Fix proxy lost log rotation, filling up disk and hogging CPU and RAM to parse large logs on restart #617
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
nginx: [emerg] unknown directive "ssl" in /etc/nginx/conf.d/https.include:8 nginx: [warn] the "listen ... http2" directive is deprecated, use the "http2" directive instead in /etc/nginx/conf.d/https.include:2
mishaschwartz
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for this quick fix @tlvu!
I think there's a much simpler way to do this though, especially since this is meant as a temporary fix:
We can just add to the scheduler-job-logrotate component to handle the nginx logs as well. We already have the infrastructure in place to do log rotation, why not just add to that?
I had thought of that but it does not work for 3 reasons:
|
fmigneault
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the fix!
| # All env in this default.env can be overridden by env.local. | ||
|
|
||
| export PROXY_IMAGE="nginx:1.23.4" | ||
| export PROXY_IMAGE="pavics/nginx-cron-logrotate:1.23.4-251205" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is built manually? Should it be configured automatically on docker hub?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes manually as this is a temporary work-around. Let's try to not make this too comfortable that it becomes a permanent solution.
E2E Test ResultsDACCS-iac Pipeline ResultsBuild URL : http://daccs-jenkins.crim.ca:80/job/DACCS-iac-birdhouse/3883/Result 🆘 ABORTEDBIRDHOUSE_DEPLOY_BRANCH : fix-nginx-proxy-missing-logrotate DACCS_IAC_BRANCH : master DACCS_CONFIGS_BRANCH : master PAVICS_E2E_WORKFLOW_TESTS_BRANCH : master PAVICS_SDI_BRANCH : master DESTROY_INFRA_ON_EXIT : true PAVICS_HOST : https://host-140-91.rdext.crim.ca
|
Ok but...
This is a temporary workaround and it's OK if we need to include a separate component for now. Also, when we eventually implement #618 it will be much easier to deprecate a single component instead of having to revert a bunch of code that has been added to the proxy component.
Good point. This can be handled by the scheduler jobs with a trigger
You're right. This makes more sense as a separate scheduler job, not as an add on to the existing scheduler-jog-logrotate job. |
|
see #619 for an alternative using scheduler jobs |
E2E Test ResultsDACCS-iac Pipeline ResultsBuild URL : http://daccs-jenkins.crim.ca:80/job/DACCS-iac-birdhouse/3884/Result ❌ FAILUREBIRDHOUSE_DEPLOY_BRANCH : fix-nginx-proxy-missing-logrotate DACCS_IAC_BRANCH : master DACCS_CONFIGS_BRANCH : master PAVICS_E2E_WORKFLOW_TESTS_BRANCH : master PAVICS_SDI_BRANCH : master DESTROY_INFRA_ON_EXIT : true PAVICS_HOST : https://host-140-216.rdext.crim.ca PAVICS-e2e-workflow-tests Pipeline ResultsTests URL : http://daccs-jenkins.crim.ca:80/job/PAVICS-e2e-workflow-tests/job/master/562/NOTEBOOK TEST RESULTS |
We are using source control, it's not that hard to do a revert using source control. But I agree your fix is easier to revert, even without source control. Thanks for finding an alternate solution. |
If we don't make any further changes to these files then yes you're right. If we make additional changes before we revert these ones then we run the risk of creating merge conflicts later on. I'm sure we could figure it out either way. |
## Overview Currently the nginx logs are not rotated so they can build up to quite a large size. Previously, they were rotated by the `canarie-api` component but that is no longer a required component and CanarieAPI hasn't handled log rotation since CanarieAPI version 1.0.0. Fixes #593. This quick and least disruptive fix to get the production server out of the water should be a temporary solution until a better solution using container STDOUT parsing is implemented for the CanarieAPI and prometheus-log-parser (#618). Then we can deprecate this scheduler job. ## Changes **Non-breaking changes** - adds new scheduler job **Breaking changes** - None ## Related Issue / Discussion - This is an alternative to #617. See #617 (comment) for reasons why I believe this is a preferred solution. ## Additional Information ## CI Operations <!-- The test suite can be run using a different DACCS config with ``birdhouse_daccs_configs_branch: branch_name`` in the PR description. To globally skip the test suite regardless of the commit message use ``birdhouse_skip_ci`` set to ``true`` in the PR description. Using ``[<cmd>]`` (with the brackets) where ``<cmd> = skip ci`` in the commit message will override ``birdhouse_skip_ci`` from the PR description. Such commit command can be used to override the PR description behavior for a specific commit update. However, a commit message cannot 'force run' a PR which the description turns off the CI. To run the CI, the PR should instead be updated with a ``true`` value, and a running message can be posted in following PR comments to trigger tests once again. --> birdhouse_daccs_configs_branch: master birdhouse_skip_ci: false
|
Replaced by #619 |
Fixes
Proxy lost log rotation, filling up disk and hogging CPU and RAM to parse large logs on restart
Broken since
2.3.0(Bump canarie api 1.0.0 #452).The root cause is the log rotation has been removed from the python code in
make canarie-api independent of cron and proxy Ouranosinc/CanarieAPI#18 without any replacement.
Fixes 🐛 [BUG]: Nginx/Canarie logging parsing bottleneck #593.
The fix is to restore logrotation in the proxy container using
cronandlogrotateinstead of the old python code. Retention and frequency are thesame as the previous python code. For this we needed a custom build of the
official Nginx docker image +
cron+logrotate.This quick and least disruptive fix to get the production server out of the
water should be a temporary solution until a better solution using container
STDOUT parsing is implemented for the CanarieAPI and prometheus-log-parser
(💡 [Feature] Implement Nginx log parsing from STDOUT stream for CanarieAPI and prometheus-log-parser #618). Then we can
switch back to the regular official Nginx image.
CI Operations
birdhouse_daccs_configs_branch: master
birdhouse_skip_ci: false