-
Notifications
You must be signed in to change notification settings - Fork 7
Add log rotation scheduler job for the nginx logs #619
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
fmigneault
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the default scheduler image is pavics/crontab, it does seem more efficient to reuse it rather than rebuilt yet another image (#617).
Since this operation is not default (ie: if scheduler not included in components), I think a "warning note" would be relevant inside `canarie-api's README section with reference to the original issue to indicate that using it should most probably also enable this to avoid problems. It could also be indicated in the proxy's README section since it is a good idea to have it even when logs are not parsed.
| /var/log/nginx/${PROXY_LOG_FILE} { | ||
| missingok | ||
| # https://github.com/Ouranosinc/CanarieAPI/blob/cc0ae59231ee4b58a34571bd12097c660aefb2e3/canarieapi/logparser.py#L15 | ||
| rotate 150 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use the template to override rotate and size?
I would rather not have that many files.
E2E Test ResultsDACCS-iac Pipeline ResultsBuild URL : http://daccs-jenkins.crim.ca:80/job/DACCS-iac-birdhouse/3885/Result ✅ SUCCESSBIRDHOUSE_DEPLOY_BRANCH : proxy-log-rotate DACCS_IAC_BRANCH : master DACCS_CONFIGS_BRANCH : master PAVICS_E2E_WORKFLOW_TESTS_BRANCH : master PAVICS_SDI_BRANCH : master DESTROY_INFRA_ON_EXIT : true PAVICS_HOST : https://host-140-91.rdext.crim.ca PAVICS-e2e-workflow-tests Pipeline ResultsTests URL : http://daccs-jenkins.crim.ca:80/job/PAVICS-e2e-workflow-tests/job/master/563/NOTEBOOK TEST RESULTS |
E2E Test ResultsDACCS-iac Pipeline ResultsBuild URL : http://daccs-jenkins.crim.ca:80/job/DACCS-iac-birdhouse/3887/Result ❌ FAILUREBIRDHOUSE_DEPLOY_BRANCH : proxy-log-rotate DACCS_IAC_BRANCH : master DACCS_CONFIGS_BRANCH : master PAVICS_E2E_WORKFLOW_TESTS_BRANCH : master PAVICS_SDI_BRANCH : master DESTROY_INFRA_ON_EXIT : true PAVICS_HOST : https://host-140-216.rdext.crim.ca PAVICS-e2e-workflow-tests Pipeline ResultsTests URL : http://daccs-jenkins.crim.ca:80/job/PAVICS-e2e-workflow-tests/job/master/565/NOTEBOOK TEST RESULTS |
tlvu
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I appreciate you spend time implementing an alternate fix to my fix that is easier to deprecate later.
However, I do not think this fix should require manual intervention from the user as it is pretty crucial for production deployment.
Please implement my suggested changes so this fix can be a proper equivalent for my fix #617).
birdhouse/components/README.rst
Outdated
| .. warning:: | ||
| Nginx logs are not automatically rotated and can build up over time. We recommend that you also enable the | ||
| `components/scheduler` component and the `optional-components/scheduler-job-logrotate-nginx` scheduler job | ||
| as a temporary fix. This will rotate the nginx logs on a regular schedule. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of putting a note here and hope everyone will RTFM, why not add all the components dependencies directly ?
Realistically, this log rotation is absolutely not "optional" for anyone with a real production deployment. For our production, without this log rotation, the file will grow around 10+ G per day ! Furthermore, this log rotation was built-in before the regression so fixing that regression, we should restore the built-in log rotation.
Same as you, it didn't feel right to make the proxy depend on scheduler and a new optional-component, that's why I went the 100% contained approach inside the proxy container only, not depending on any external components with my PR #617. But if you don't like my approach, then this PR is not really an equivalent replacement for my PR either ! Mine fix the regression without requiring any manual intervention from the user.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can remove this warning now that the direct dependency is there. Less stuff to remember when the times comes to deprecate.
| @@ -0,0 +1,11 @@ | |||
| /var/log/nginx/${PROXY_LOG_FILE} { | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not just simply *.log as the default built-in logrotate config from the Nginx image? Any particular reason to restrict logrotation to only one single file? I understand right now we only need for this file but why not already provide logrotation for all future files? Especially when it is so simple?
Not a blocking concern, I am just being curious.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
access.log and error.log both exist in that directory and both are symlinks to /dev/stdout and /dev/stderr so logrotate is unable to rotate them since they are not regular files.
I'm not sure what other future files we should be considering here? Especially since we're planning on moving to logging to stdout and stderr only in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes right now no other candidate to logrotate but I usually prefer to avoid locking myself and future proof myself (in general, I understand this case is special since it's a temporary fix), especially when it's very easy and there are no bad impact.
So the reverse question is what mistake are we trying to prevent by locking this logrotate to one single file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logrotate shows warnings when you tell it to rotate a stream. warnings warn us when we are doing something that the software doesn't like. I would prefer to not intentionally do something that raises warnings.
PROXY_LOG_FILE is configurable. You can change the filename to anything you want and it doesn't have to have a .log extension so changing the file name could break log rotation if we're just rotating *.log files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No I do not meant to change the value of PROXY_LOG_FILE, I meant to already be ready for other *.log files. Anyways, forget this point. This is meant to be a temporary solution so hopefully you can implement the stream parsing in not too long so we don't need this PR anymore.
CHANGES.md
Outdated
| ------------------------------------------------------------------------------------------------------------------ | ||
|
|
||
| [//]: # (list changes here, using '-' for each new entry, remove this when items are added) | ||
| ## Changes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not a change, it's fix for a regression !
|
|
||
| - Add log rotation scheduler job for the nginx logs | ||
|
|
||
| Currently the nginx logs are not rotated so they can build up to quite a large size. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not just a large size it's also the hogging of CPU and RAM to parse that gigantic large log file on restart.
Since it's a regression, please also mention when the regression started "Broken since 2.3.0 (#452)."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry I wasn't clear, please also add the part about "hogging of CPU and RAM to parse that gigantic large log file on restart."
E2E Test ResultsDACCS-iac Pipeline ResultsBuild URL : http://daccs-jenkins.crim.ca:80/job/DACCS-iac-birdhouse/3892/Result ❌ FAILUREBIRDHOUSE_DEPLOY_BRANCH : proxy-log-rotate DACCS_IAC_BRANCH : master DACCS_CONFIGS_BRANCH : master PAVICS_E2E_WORKFLOW_TESTS_BRANCH : master PAVICS_SDI_BRANCH : master DESTROY_INFRA_ON_EXIT : true PAVICS_HOST : https://host-140-216.rdext.crim.ca PAVICS-e2e-workflow-tests Pipeline ResultsTests URL : http://daccs-jenkins.crim.ca:80/job/PAVICS-e2e-workflow-tests/job/master/569/NOTEBOOK TEST RESULTS |
birdhouse/components/README.rst
Outdated
| .. warning:: | ||
| Nginx logs are not automatically rotated and can build up over time. We recommend that you also enable the | ||
| `components/scheduler` component and the `optional-components/scheduler-job-logrotate-nginx` scheduler job | ||
| as a temporary fix. This will rotate the nginx logs on a regular schedule. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can remove this warning now that the direct dependency is there. Less stuff to remember when the times comes to deprecate.
| \$SCHEDULER_JOB_LOGROTATE_NGINX_SIZE | ||
| " | ||
|
|
||
| # TODO: deprecate this component once https://github.com/bird-house/birdhouse-deploy/issues/618 is resolved |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a note to keep the .gitignore on deprecation, else will break autodeploy on existing deployment because those ignore files will become uncommitted files if the .gitignore is removed.
| @@ -0,0 +1,11 @@ | |||
| /var/log/nginx/${PROXY_LOG_FILE} { | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes right now no other candidate to logrotate but I usually prefer to avoid locking myself and future proof myself (in general, I understand this case is special since it's a temporary fix), especially when it's very easy and there are no bad impact.
So the reverse question is what mistake are we trying to prevent by locking this logrotate to one single file?
|
|
||
| - Add log rotation scheduler job for the nginx logs | ||
|
|
||
| Currently the nginx logs are not rotated so they can build up to quite a large size. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry I wasn't clear, please also add the part about "hogging of CPU and RAM to parse that gigantic large log file on restart."
E2E Test ResultsDACCS-iac Pipeline ResultsBuild URL : http://daccs-jenkins.crim.ca:80/job/DACCS-iac-birdhouse/3896/Result ❌ FAILUREBIRDHOUSE_DEPLOY_BRANCH : proxy-log-rotate DACCS_IAC_BRANCH : master DACCS_CONFIGS_BRANCH : master PAVICS_E2E_WORKFLOW_TESTS_BRANCH : master PAVICS_SDI_BRANCH : master DESTROY_INFRA_ON_EXIT : true PAVICS_HOST : https://host-140-91.rdext.crim.ca PAVICS-e2e-workflow-tests Pipeline ResultsTests URL : http://daccs-jenkins.crim.ca:80/job/PAVICS-e2e-workflow-tests/job/master/572/NOTEBOOK TEST RESULTS |
E2E Test ResultsDACCS-iac Pipeline ResultsBuild URL : http://daccs-jenkins.crim.ca:80/job/DACCS-iac-birdhouse/3897/Result ❌ FAILUREBIRDHOUSE_DEPLOY_BRANCH : proxy-log-rotate DACCS_IAC_BRANCH : master DACCS_CONFIGS_BRANCH : master PAVICS_E2E_WORKFLOW_TESTS_BRANCH : master PAVICS_SDI_BRANCH : master DESTROY_INFRA_ON_EXIT : true PAVICS_HOST : https://host-140-154.rdext.crim.ca PAVICS-e2e-workflow-tests Pipeline ResultsTests URL : http://daccs-jenkins.crim.ca:80/job/PAVICS-e2e-workflow-tests/job/master/573/NOTEBOOK TEST RESULTS |
E2E Test ResultsDACCS-iac Pipeline ResultsBuild URL : http://daccs-jenkins.crim.ca:80/job/DACCS-iac-birdhouse/3898/Result ❌ FAILUREBIRDHOUSE_DEPLOY_BRANCH : proxy-log-rotate DACCS_IAC_BRANCH : master DACCS_CONFIGS_BRANCH : master PAVICS_E2E_WORKFLOW_TESTS_BRANCH : master PAVICS_SDI_BRANCH : master DESTROY_INFRA_ON_EXIT : true PAVICS_HOST : https://host-140-216.rdext.crim.ca PAVICS-e2e-workflow-tests Pipeline ResultsTests URL : http://daccs-jenkins.crim.ca:80/job/PAVICS-e2e-workflow-tests/job/master/574/NOTEBOOK TEST RESULTS |
Overview
Currently the nginx logs are not rotated so they can build up to quite a large size.
Previously, they were rotated by the
canarie-apicomponent but that is no longer a required component and CanarieAPI hasn't handled log rotation since CanarieAPI version 1.0.0.Fixes #593.
This quick and least disruptive fix to get the production server out of the water should be a temporary solution until a better solution using container STDOUT parsing is implemented for the CanarieAPI and prometheus-log-parser (#618). Then we can deprecate this scheduler job.
Changes
Non-breaking changes
Breaking changes
Related Issue / Discussion
Additional Information
CI Operations
birdhouse_daccs_configs_branch: master
birdhouse_skip_ci: false