-
Notifications
You must be signed in to change notification settings - Fork 7
Introduce a scheduler job to delete old files that may accumulate over time #510
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
E2E Test ResultsDACCS-iac Pipeline ResultsBuild URL : http://daccs-jenkins.crim.ca:80/job/DACCS-iac-birdhouse/3133/Result ✅ SUCCESSBIRDHOUSE_DEPLOY_BRANCH : delete-old-files DACCS_IAC_BRANCH : master DACCS_CONFIGS_BRANCH : master PAVICS_E2E_WORKFLOW_TESTS_BRANCH : PAVICS_SDI_BRANCH : master DESTROY_INFRA_ON_EXIT : true PAVICS_HOST : https://host-140-91.rdext.crim.ca PAVICS-e2e-workflow-tests Pipeline ResultsTests URL : http://daccs-jenkins.crim.ca:80/job/PAVICS-e2e-workflow-tests/job/master/98/NOTEBOOK TEST RESULTS |
fmigneault
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm missing a bit of context to do the review.
Are the targeted log files the ones generated by
| x-logging: &default-logging | |
| driver: "json-file" | |
| options: | |
| max-size: "50m" | |
| max-file: "10" |
Or are they other logs produced by some other process?
If so, don't these service have specific log-rotate configurations?
Specifically for WPS-outputs, I'm curious about which ones are involved. The logs of older jobs might be relevant for traceability.
|
Good questions...
No, those are handled by docker and currently only last for the lifetime of the docker container (which is a whole other issue that needs to be addressed in a completely different way).
Yes:
THREDDS does but not in a way that is easily overridable (see above) and WPS outputs do not.
Whichever ones you have enabled I guess, right now on my test machine I have raven and finch enabled so there are output directories for those. I guess weaver puts its outputs in the data persist directory as a bind mount ... I should probably add an option to clean those ones up too right?
I'm not sure what you mean... the user gets to set the policy for how long to keep old files. If you're concerned about losing old files then you should just not enable this feature. |
|
I think it is reasonable for the THREDDS case. For the WPSs, it should be up to them to do the relevant cleanup, since those files are the output of potentially long processing, and the service could be "smarter" about its persistence. I don't think PyWPS provides something explicit about cleanup, but Weaver that also uses the WPS outputs has explicit handling, so it would need to be configurable per-service at least. https://pavics-weaver.readthedocs.io/en/latest/cli.html#dismiss |
|
You say:
but then
So do you mean that the solution needs to be a change to PyWPS itself or that the change I propose here is good enough for now until a solution to PyWPS is available? I'm happy to let weaver handle its own thing. Can you explain how someone would configure weaver to delete outputs older than 1 year (for example)? |
|
It's good enough if it is configurable per service, not globally over the volume/directory. Maybe consider also using the For Weaver, the same logic could be applied with a cron, but rather than wiping everything, you can invoke it to selectively remove some items. For example, you might not care about the exact logs or output data that was staged elsewhere, but want to keep the provenance details for traceability and trust of the workflow. |
Sure that's easy to do
This removes THREDDS logs, not datasets
It is configurable... I can make the choice between atime and mtime configurable as well though. |
|
Let's remove this from the 2.11 milestone. I have a bunch more scheduler jobs that I'm working on and I don't think the individual jobs need to be part of the next minor release. I understood that we were only going to include #504 and a few of the minor patches, not #503 even. I'm ok to add #503 since that mostly finished anyway but this one still needs work. |
tlvu
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just quick comments.
| export THREDDS_DELETE_FILES_OLDER_THAN_DAYS= | ||
| export SCHEDULER_JOB_CLEAN_OLD_FILES_OPTIONS=" | ||
| ${SCHEDULER_JOB_CLEAN_OLD_FILES_OPTIONS} | ||
| \$([ -n \"\${THREDDS_DELETE_FILES_OLDER_THAN_DAYS}\" ] && echo \"thredds_persistence:/thredds|/thredds/logs/threddsServlet.*.log|\${THREDDS_DELETE_FILES_OLDER_THAN_DAYS}\") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we stay away from appending to variable to avoid escaping and potentially duplicate problem.
I much like the dropping a config file approach. It's more visible to debug too, than having to track the content of a variable.
How about the script clean-old-files.sh read /some-dir/*.conf and iterate over them?
Sample .conf format:
LOCATION=/path/to/dir
AGE=num
... and we can easily add more config var later and we avoid parsing problem.
Each component just drop a .conf file at the location where clean-old-files.sh expects.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem is that we're not mounting config files to a container that is run by birdhouse-deploy when it starts up. The scheduler component runs the container so all the additional configurations need to be specified in the scheduler-job-clean_old_files/config.yml file.
No matter what we do, we need to generate that file dynamically using our template mechanism. In order to do that, we need to add additional information in an environment variable.
We could create configuration files and mount them to the container, but we'd still be appending to a variable in order to tell the scheduler which configuration files to mount. Going through a configuration file just adds extra code/overhead and doesn't buy us much.
If you want to debug this easily I'd recommend:
bin/birdhouse configs -c 'echo $SCHEDULER_JOB_CLEAN_OLD_FILES_OPTIONS'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just thinking about this again ...
I guess we could have a separate scheduler job for each cleanup option ... that would get a little strange because of the way that component dependencies work but maybe I can introduce something here that would make resolving those dependencies a bit easier. Let me think about this a bit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok I've managed an implementation where individual cleanup jobs are their own scheduler job. @tlvu please check it out when you have a minute.
|
|
||
| export SCHEDULER_JOB_CLEAN_OLD_FILES_FREQUENCY="5 2 * * 0" # weekly on Sunday at 2:05 | ||
|
|
||
| export SCHEDULER_JOB_CLEAN_OLD_FILES_VOLUMES='$(for opt in ${SCHEDULER_JOB_CLEAN_OLD_FILES_OPTIONS}; do printf " --volume %s:rw " "$(echo $opt | cut -d\| -f 1)"; done)' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh ! Additional volume mount as well ! I have the same problem with adapting deploy-data job. I am testing my change, once working, I'll send a PR.
|
I've made it so that it is individually configurable for each service and I have set it so that the default mode is |
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
E2E Test ResultsDACCS-iac Pipeline ResultsBuild URL : http://daccs-jenkins.crim.ca:80/job/DACCS-iac-birdhouse/3151/Result ❌ FAILUREBIRDHOUSE_DEPLOY_BRANCH : delete-old-files DACCS_IAC_BRANCH : master DACCS_CONFIGS_BRANCH : master PAVICS_E2E_WORKFLOW_TESTS_BRANCH : PAVICS_SDI_BRANCH : master DESTROY_INFRA_ON_EXIT : true PAVICS_HOST : https://host-140-91.rdext.crim.ca PAVICS-e2e-workflow-tests Pipeline ResultsTests URL : http://daccs-jenkins.crim.ca:80/job/PAVICS-e2e-workflow-tests/job/master/111/NOTEBOOK TEST RESULTS |
E2E Test ResultsDACCS-iac Pipeline ResultsBuild URL : http://daccs-jenkins.crim.ca:80/job/DACCS-iac-birdhouse/3166/Result ❌ FAILUREBIRDHOUSE_DEPLOY_BRANCH : delete-old-files DACCS_IAC_BRANCH : master DACCS_CONFIGS_BRANCH : master PAVICS_E2E_WORKFLOW_TESTS_BRANCH : PAVICS_SDI_BRANCH : master DESTROY_INFRA_ON_EXIT : true PAVICS_HOST : https://host-140-46.rdext.crim.ca PAVICS-e2e-workflow-tests Pipeline ResultsTests URL : http://daccs-jenkins.crim.ca:80/job/PAVICS-e2e-workflow-tests/job/master/121/NOTEBOOK TEST RESULTS |
E2E Test ResultsDACCS-iac Pipeline ResultsBuild URL : http://daccs-jenkins.crim.ca:80/job/DACCS-iac-birdhouse/3171/Result ✅ SUCCESSBIRDHOUSE_DEPLOY_BRANCH : delete-old-files DACCS_IAC_BRANCH : master DACCS_CONFIGS_BRANCH : master PAVICS_E2E_WORKFLOW_TESTS_BRANCH : PAVICS_SDI_BRANCH : master DESTROY_INFRA_ON_EXIT : true PAVICS_HOST : https://host-140-154.rdext.crim.ca PAVICS-e2e-workflow-tests Pipeline ResultsTests URL : http://daccs-jenkins.crim.ca:80/job/PAVICS-e2e-workflow-tests/job/master/125/NOTEBOOK TEST RESULTS |
E2E Test ResultsDACCS-iac Pipeline ResultsBuild URL : http://daccs-jenkins.crim.ca:80/job/DACCS-iac-birdhouse/3172/Result ❌ FAILUREBIRDHOUSE_DEPLOY_BRANCH : delete-old-files DACCS_IAC_BRANCH : master DACCS_CONFIGS_BRANCH : master PAVICS_E2E_WORKFLOW_TESTS_BRANCH : PAVICS_SDI_BRANCH : master DESTROY_INFRA_ON_EXIT : true PAVICS_HOST : https://host-140-133.rdext.crim.ca PAVICS-e2e-workflow-tests Pipeline ResultsTests URL : http://daccs-jenkins.crim.ca:80/job/PAVICS-e2e-workflow-tests/job/master/126/NOTEBOOK TEST RESULTS |
E2E Test ResultsDACCS-iac Pipeline ResultsBuild URL : http://daccs-jenkins.crim.ca:80/job/DACCS-iac-birdhouse/3181/Result ❌ FAILUREBIRDHOUSE_DEPLOY_BRANCH : delete-old-files DACCS_IAC_BRANCH : master DACCS_CONFIGS_BRANCH : master PAVICS_E2E_WORKFLOW_TESTS_BRANCH : PAVICS_SDI_BRANCH : master DESTROY_INFRA_ON_EXIT : true PAVICS_HOST : https://host-140-154.rdext.crim.ca
|
E2E Test ResultsDACCS-iac Pipeline ResultsBuild URL : http://daccs-jenkins.crim.ca:80/job/DACCS-iac-birdhouse/3600/Result ❌ FAILUREBIRDHOUSE_DEPLOY_BRANCH : delete-old-files DACCS_IAC_BRANCH : master DACCS_CONFIGS_BRANCH : master PAVICS_E2E_WORKFLOW_TESTS_BRANCH : master PAVICS_SDI_BRANCH : master DESTROY_INFRA_ON_EXIT : true PAVICS_HOST : https://host-140-20.rdext.crim.ca PAVICS-e2e-workflow-tests Pipeline ResultsTests URL : http://daccs-jenkins.crim.ca:80/job/PAVICS-e2e-workflow-tests/job/master/408/NOTEBOOK TEST RESULTS |
E2E Test ResultsDACCS-iac Pipeline ResultsBuild URL : http://daccs-jenkins.crim.ca:80/job/DACCS-iac-birdhouse/3651/Result 🆘 ABORTEDBIRDHOUSE_DEPLOY_BRANCH : delete-old-files DACCS_IAC_BRANCH : master DACCS_CONFIGS_BRANCH : master PAVICS_E2E_WORKFLOW_TESTS_BRANCH : master PAVICS_SDI_BRANCH : master DESTROY_INFRA_ON_EXIT : true PAVICS_HOST : https://host-140-118.rdext.crim.ca
|
|
run tests |
E2E Test ResultsDACCS-iac Pipeline ResultsBuild URL : http://daccs-jenkins.crim.ca:80/job/DACCS-iac-birdhouse/3652/Result ❌ FAILUREBIRDHOUSE_DEPLOY_BRANCH : delete-old-files DACCS_IAC_BRANCH : master DACCS_CONFIGS_BRANCH : master PAVICS_E2E_WORKFLOW_TESTS_BRANCH : master PAVICS_SDI_BRANCH : master DESTROY_INFRA_ON_EXIT : true PAVICS_HOST : https://host-140-118.rdext.crim.ca PAVICS-e2e-workflow-tests Pipeline ResultsTests URL : http://daccs-jenkins.crim.ca:80/job/PAVICS-e2e-workflow-tests/job/master/436/NOTEBOOK TEST RESULTS |
fmigneault
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just one item remaining regarding documentation, but looks fine for the feature itself.
| * Automatically remove old files | ||
|
|
||
| * Removes files generated by other components that may accumulate over time and are not manage automatically by those components. | ||
|
|
||
| * Currently supports removing WPS output files from the ``finch``, ``raven``, and ``hummingbird`` components as well as log files | ||
| from the ``thredds`` component. | ||
|
|
||
| * component location: ``optional-components/scheduler-job-clean_old_files`` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be in the optional-components' README.
Should also include mentions of the specific per-service configuration variables, just to provide insights that there are options to enable/disable them individually, and further options to control their cron/cleanup/fs-meta strategies.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All of the optional scheduler jobs are documented here and in env.local.example (for configuration options).
If we want to improve the documentation for all these jobs that's fine but we should do so for all, not just one, and maybe in a different PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since it is an optional component, it should be described in birdhouse/optional-components/README.rst, not birdhouse/components/README.rst.
The documentation does not need to be explicit about all combinations of component/variables, but should at least indicate some guidance about configuring available options. Notably cron config with atime/mtime conditions on a per-component basis is not that obvious for users that didn't participate in this PR's discussions. Reading only the default.env does not provide all details.
For example:
birdhouse-deploy/birdhouse/optional-components/scheduler-job-clean_old_files/default.env
Lines 5 to 9 in af633e4
| export SCHEDULER_JOB_CLEAN_OLD_FILES_FINCH_OLDER_THAN_DAYS= # unset by default if this job is enabled this must be set to an integer | |
| export SCHEDULER_JOB_CLEAN_OLD_FILES_FINCH_TIME_MODE=atime | |
| export SCHEDULER_JOB_CLEAN_OLD_FILES_FINCH_FREQUENCY="5 4 * * 0" # weekly on Sunday at 4:05 | |
| export SCHEDULER_JOB_CLEAN_OLD_FILES_FINCH_ENABLED=false | |
| export __SCHEDULER_JOB_CLEAN_OLD_FILES_FINCH_CONFIG_LOC='$( [ "${SCHEDULER_JOB_CLEAN_OLD_FILES_FINCH_ENABLED}" = "true" ] && echo "config/finch/config.yml" )' |
(from the perspective of a new user):
- How does
..._OLDER_THAN_DAYSimpact/interact with..._FREQUENCYif set to a value? - Can
..._OLDER_THAN_DAYSbe another lower value (e.g.: every hour)? - What are the possible values for
_TIME_MODE? (atime,mtime,ctime, others?)
Is there any that would break the functionality if employed? - How do I find out which files are affected for each component?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(from the perspective of a new user): ...
All these questions are covered in the documentation in env.local.example and the README file
birdhouse-deploy/birdhouse/env.local.example
Lines 209 to 218 in af633e4
| # For all options below: | |
| # - variables that end with SCHEDULER_JOB_CLEAN_OLD_FILES_ENABLED will enable the a clean_old_files job for the relevant component if set to 'true' | |
| # - variables that end with DELETE_FILES_OLDER_THAN_DAYS set a number of days. Files older than this number of days will be deleted every time | |
| # the scheduler-job-clean_old_files scheduler job runs. | |
| # - variables that end with DELETE_FILES_TIME_MODE is used by the find command to calculate the age of a file: | |
| # - atime: delete files that haven't been accessed in X days | |
| # - mtime: delete files that haven't been modified in X days | |
| # - ctime: delete files that were created more than X days ago | |
| # - variables that end with SCHEDULER_JOB_CLEAN_OLD_FILES_FREQUENCY set the frequency at which the relevant clean old files job should be run. This | |
| # value is a string that conforms to the cron schedule format. |
birdhouse-deploy/birdhouse/components/README.rst
Lines 66 to 67 in af633e4
| * Currently supports removing WPS output files from the ``finch``, ``raven``, and ``hummingbird`` components as well as log files | |
| from the ``thredds`` component. |
I can either come back to this PR at a later date when I have time to write more extensive documentation or we can include this now and I'll write more documentation in general for the scheduler jobs as a separate PR. I don't really have a preference so if you'd like to wait to merge this PR until more documentation is in place I'll leave it for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A copy-paste is sufficient with quotes for the code elements.
Can be updated later with more details if that becomes necessary.
I prefer actual docs because it is easier to navigate and read by users. If I point a user to this env file, they will nope-out immediately.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reads those files and generates a default configuration file for the user on demand.
Yes this clearly solve the duplicate docs problem but might render the tracking of what config var changed (added, renamed, delete) harder for the simple user. Example with JupyterHub, I had to jupyterhub --generate-config to have this one single file so I can diff against my current config. Given I am not familiar with their code, I can not see the difference easily simply when browsing the code.
If we take the simple road of simply referencing the config var doc to the proper default.env file, we also fix the duplicate doc problem and we provide a one single file for new users to look up for the possible list of "public" config vars.
This one single file can also be diffed to find out when a new config var has been added, renamed or deleted.
But with only the reference to the default.env, user will have to make an additional lookup to the matching default.env to read the docs, if the var name is unclear.
Both (one single file for easy browsing vs generated docs) have pros and cons. I do not have a strong opinion on either of those approaches.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking more about this duplicate doc problem, how about
- in
env.local.examplewe only put example usage but no documentation wording, referring to the matchingdefault.env - be very diligent about var name and example to make it self explanatory
Honesly, when reading docs I can infer a lot but just looking at the example usage. I will actually read the full documentation only when my inference from the example do not work as intended.
And I think when updating docs, we tend to update the wording to clarify explanation much more frequently than changing the example. Because when changing the example, we basically change the format of the expected value, that means code change and variable meaning change, which happen much less frequently.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we really have super diligent and lengthy documentation in most cases in env files. Most places where there are more details is typically because of specific gotchas that are important to highlight contextually in that place, or that support examples, both of which are more relevant with the variable (or others around involved with it).
If we add more verbose documentation or explanations to further guide users, then I think those should be only in the README. The few example/comments should be self-explanatory as much as possible in env files. There can also be "see https://..." comments pointing at README to facilitate navigation/information retrieval.
About components / optional-components location, my main concern is that if I specifically add a given optional-components feature, I would look up in the corresponding README to find its details. Not having a direct reference just makes it harder to find things. The fact that this optional feature uses the components scheduler does not matter. The scheduler should only list cross-refs to all compatible job components (wherever they reside), and their specific entries should provide more details as needed. This is already the case for many other optional-components that point at their component requirement/compatibility references this way. Scheduler jobs are the ones doing things differently that all others.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok...
There are a lot of improvements that can be made to the documentation. I think we all agree on that.
However I think that the question of how to organize the README files is separate from how to organize default.env and env.local.example.
For the README files, yes we should keep the info from components and optional components in their respective READMEs. And I am happy to start that by putting the documentation relevant to this PR in the optional components.
A separate PR(s) will have to be made to update the documentation for the other components. For example, the auto deploy info is still in components even though it is nowadays scheduler job in optional components.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. Only focus on the current components edited/added within this PR.
tlvu
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great feature. Sorry for my late review. See my feedback below.
| * Automatically remove old files | ||
|
|
||
| * Removes files generated by other components that may accumulate over time and are not manage automatically by those components. | ||
|
|
||
| * Currently supports removing WPS output files from the ``finch``, ``raven``, and ``hummingbird`` components as well as log files | ||
| from the ``thredds`` component. | ||
|
|
||
| * component location: ``optional-components/scheduler-job-clean_old_files`` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am myself often split between where to put documentation for new config var that I add, in default.env that they are introduced with their default value or in env.local.template there they will be actually used by the user. So far I've been copy/paste between both files.
But that means duplicate documentation and error prone when we update the doc, we will forget to update the other location.
I propose we pick default.env and in env.local.template we put a reference line that looks like "see component-name/default.env" for documentation of those variables. I choose default.env because in there we will also have other "danger zone" variables that we do not want to expose to env.local.template so all the docs for all config var for a given component will be in one single location, the default.env file.
| * Automatically remove old files | ||
|
|
||
| * Removes files generated by other components that may accumulate over time and are not manage automatically by those components. | ||
|
|
||
| * Currently supports removing WPS output files from the ``finch``, ``raven``, and ``hummingbird`` components as well as log files | ||
| from the ``thredds`` component. | ||
|
|
||
| * component location: ``optional-components/scheduler-job-clean_old_files`` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As for the docs for the new optional-components/scheduler-job-clean_old_files/ being in components/README.rst I think it's okay because the other already existing jobs that are in optional-components/ are also documented there.
Unless we move them all out to optional-components/README.rst and add a reference in the component/scheduler/ component that the list of possible jobs are in the other optional-components/README.rst file. I think it's less user-friendly this way while it would respect more our "conventions". I would favor human over conventions.
| if [ "${SCHEDULER_JOB_CLEAN_OLD_FILES_FINCH_ENABLED}" = "true" ] && ! echo "${BIRDHOUSE_EXTRA_CONF_DIRS}" | grep -q 'finch[[:space:]]*$'; then | ||
| echo "$SCHEDULER_JOB_CLEAN_OLD_FILES_FINCH_OLDER_THAN_DAYS" | grep -q '^[0-9][0-9]*$' || log WARN "SCHEDULER_JOB_CLEAN_OLD_FILES_FINCH_OLDER_THAN_DAYS variable must be an integer not '${SCHEDULER_JOB_CLEAN_OLD_FILES_FINCH_OLDER_THAN_DAYS}'. Please set this variable to an integer or disable the finch file cleaning job. This job will not run properly!" | ||
| [ "${_acceptable_modes#*"${SCHEDULER_JOB_CLEAN_OLD_FILES_FINCH_TIME_MODE}"}" = "${_acceptable_modes}" ] && log WARN "SCHEDULER_JOB_CLEAN_OLD_FILES_FINCH_TIME_MODE variable must be one of 'mtime', 'atime', or 'ctime' not '${SCHEDULER_JOB_CLEAN_OLD_FILES_FINCH_TIME_MODE}'. Please set this variable to a valid option or disable the finch file cleaning job. This job will not run properly!" | ||
| fi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this be refactored to a function like check-clean-old-files-jobs-vars and called like check-clean-old-files-jobs-vars $SCHEDULER_JOB_CLEAN_OLD_FILES_FINCH_ENABLED finch $SCHEDULER_JOB_CLEAN_OLD_FILES_FINCH_OLDER_THAN_DAYS $SCHEDULER_JOB_CLEAN_OLD_FILES_FINCH_TIME_MODE for each of the 4 jobs?
| --rm --name scheduler-job-clean_old_files_finch | ||
| --volume ${COMPOSE_DIR}/optional-components/scheduler-job-clean_old_files/clean-old-files.sh:/clean-old-files.sh:ro | ||
| --volume "${COMPOSE_PROJECT_NAME}_wps_outputs:/wps_outputs:rw" | ||
| image: '${SCHEDULER_JOB_CLEAN_OLD_FILES_IMAGE}' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given it's an unattended cron job, where is this logging to so we can debug if something goes wrong? Can log to BIRDHOUSE_LOG_DIR with already built-in logrotation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually have a question about that....
Now that we are making more use of the scheduler to run additional jobs, why are we writing additional logs to BIRDHOUSE_LOG_DIR for the scheduler jobs instead of the usual pattern of logging to the stdout/stderr of the scheduler container? Then docker would manage the logs for us (and the logs can easily be backed up using the backup scheduler job).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logging to the stdout/stderr of the scheduler container?
That is fine for regular service container (1 service, 1 container) but here if all the jobs all log to the same scheduler container we can have interleaving log if jobs runtime happen to overlap. Even without overlap, it will make searching for the desired log harder because all the jobs logs to the same place. With each separate job logging to separate file, we avoid the search for the desired job log problem and the possible overlap problem. The more jobs we have, the more likely the problems will happen.
So I much prefer each job to its own separate log file.
Overview
Creates the
optional-component-clean_old_filesjob that deletes old THREDDS log files and WPS output files. Allows individual cleanup jobs to be enabled for each ofraven,finch,hummingbird, andthreddscomponents. Allows the user to configure how old a file must be before it is deleted (age in days) and how to calculate the age of the file (time since last modified, time since last accessed, time since created).(see
env.local.exampleor theschedulerdocumentation for details).Changes
Non-breaking changes
Breaking changes
Related Issue / Discussion
Additional Information
Links to other issues or sources.
CI Operations
birdhouse_daccs_configs_branch: master
birdhouse_skip_ci: false