Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
22 commits
Select commit Hold shift + click to select a range
d14c99e
delete old files scheduler job
mishaschwartz Mar 18, 2025
8a590a5
Merge branch 'configurable-crontab' into delete-old-files
mishaschwartz Mar 20, 2025
354e74c
make every job a separate file
mishaschwartz Mar 20, 2025
2d198e8
update docs, fix alpine version, fix command executable
mishaschwartz Mar 20, 2025
84b0ef6
cleanup old code and comments
mishaschwartz Mar 20, 2025
796766e
update CHANGES
mishaschwartz Mar 20, 2025
d39494c
Merge branch 'configurable-crontab' into delete-old-files
mishaschwartz Mar 25, 2025
1059ffe
reconfigure warnings if dependent components are not enabled
mishaschwartz Mar 26, 2025
6b5483e
Merge branch 'configurable-crontab' into delete-old-files
mishaschwartz Mar 26, 2025
fa3e1c2
Merge branch 'configurable-crontab' into delete-old-files
mishaschwartz Mar 27, 2025
f0192f4
Merge branch 'configurable-crontab' into delete-old-files
mishaschwartz Mar 31, 2025
fe202df
Merge branch 'configurable-crontab' into delete-old-files
mishaschwartz Apr 4, 2025
4eff78d
Merge branch 'configurable-crontab' into delete-old-files
mishaschwartz May 1, 2025
26e0595
Merge branch 'configurable-crontab' into delete-old-files
mishaschwartz May 2, 2025
7585d55
Merge branch configurable-crontab' into delete-old-files
mishaschwartz May 8, 2025
03d5963
Merge branch 'configurable-crontab' into delete-old-files
mishaschwartz May 13, 2025
90bd59f
Merge branch 'master' into delete-old-files
mishaschwartz May 27, 2025
279b841
Merge branch 'master' into delete-old-files
mishaschwartz Aug 27, 2025
f36262f
review comments
mishaschwartz Sep 12, 2025
af633e4
Merge branch 'master' into delete-old-files
mishaschwartz Sep 12, 2025
2e4dd81
Merge branch 'master' into delete-old-files
mishaschwartz Sep 26, 2025
e61d833
Merge branch 'master' into delete-old-files
mishaschwartz Jan 27, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 8 additions & 1 deletion CHANGES.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,14 @@
[Unreleased](https://github.com/bird-house/birdhouse-deploy/tree/master) (latest)
------------------------------------------------------------------------------------------------------------------

[//]: # (list changes here, using '-' for each new entry, remove this when items are added)
- Introduce a scheduler job to delete old files that may accumulate over time

Creates the `optional-component/clean_old_files` job that deletes old THREDDS log files and WPS output files.
Allows individual cleanup jobs to be enabled for each of `raven`, `finch`, `hummingbird`, and `thredds` components.
Allows the user to configure how old a file must be before it is deleted (age in days) and how to calculate the age
of the file (time since last modified, time since last accessed, time since created).

(see `env.local.example` or the `scheduler` documentation for details).

[2.21.0](https://github.com/bird-house/birdhouse-deploy/tree/2.21.0) (2026-01-27)
------------------------------------------------------------------------------------------------------------------
Expand Down
9 changes: 9 additions & 0 deletions birdhouse/components/README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -59,6 +59,15 @@ component directory to the ``BIRDHOUSE_EXTRA_CONF_DIRS`` variable in your local

* component location: ``optional-components/scheduler-job-deploy_raven_testdata``

* Automatically remove old files

* Removes files generated by other components that may accumulate over time and are not manage automatically by those components.

* Currently supports removing WPS output files from the ``finch``, ``raven``, and ``hummingbird`` components as well as log files
from the ``thredds`` component.

* component location: ``optional-components/scheduler-job-clean_old_files``
Comment on lines +62 to +69
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be in the optional-components' README.

Should also include mentions of the specific per-service configuration variables, just to provide insights that there are options to enable/disable them individually, and further options to control their cron/cleanup/fs-meta strategies.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All of the optional scheduler jobs are documented here and in env.local.example (for configuration options).

If we want to improve the documentation for all these jobs that's fine but we should do so for all, not just one, and maybe in a different PR.

Copy link
Member

@fmigneault fmigneault Sep 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since it is an optional component, it should be described in birdhouse/optional-components/README.rst, not birdhouse/components/README.rst.

The documentation does not need to be explicit about all combinations of component/variables, but should at least indicate some guidance about configuring available options. Notably cron config with atime/mtime conditions on a per-component basis is not that obvious for users that didn't participate in this PR's discussions. Reading only the default.env does not provide all details.

For example:

export SCHEDULER_JOB_CLEAN_OLD_FILES_FINCH_OLDER_THAN_DAYS= # unset by default if this job is enabled this must be set to an integer
export SCHEDULER_JOB_CLEAN_OLD_FILES_FINCH_TIME_MODE=atime
export SCHEDULER_JOB_CLEAN_OLD_FILES_FINCH_FREQUENCY="5 4 * * 0" # weekly on Sunday at 4:05
export SCHEDULER_JOB_CLEAN_OLD_FILES_FINCH_ENABLED=false
export __SCHEDULER_JOB_CLEAN_OLD_FILES_FINCH_CONFIG_LOC='$( [ "${SCHEDULER_JOB_CLEAN_OLD_FILES_FINCH_ENABLED}" = "true" ] && echo "config/finch/config.yml" )'

(from the perspective of a new user):

  • How does ..._OLDER_THAN_DAYS impact/interact with ..._FREQUENCY if set to a value?
  • Can ..._OLDER_THAN_DAYS be another lower value (e.g.: every hour)?
  • What are the possible values for _TIME_MODE? (atime, mtime, ctime, others?)
    Is there any that would break the functionality if employed?
  • How do I find out which files are affected for each component?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(from the perspective of a new user): ...

All these questions are covered in the documentation in env.local.example and the README file

# For all options below:
# - variables that end with SCHEDULER_JOB_CLEAN_OLD_FILES_ENABLED will enable the a clean_old_files job for the relevant component if set to 'true'
# - variables that end with DELETE_FILES_OLDER_THAN_DAYS set a number of days. Files older than this number of days will be deleted every time
# the scheduler-job-clean_old_files scheduler job runs.
# - variables that end with DELETE_FILES_TIME_MODE is used by the find command to calculate the age of a file:
# - atime: delete files that haven't been accessed in X days
# - mtime: delete files that haven't been modified in X days
# - ctime: delete files that were created more than X days ago
# - variables that end with SCHEDULER_JOB_CLEAN_OLD_FILES_FREQUENCY set the frequency at which the relevant clean old files job should be run. This
# value is a string that conforms to the cron schedule format.

* Currently supports removing WPS output files from the ``finch``, ``raven``, and ``hummingbird`` components as well as log files
from the ``thredds`` component.

I can either come back to this PR at a later date when I have time to write more extensive documentation or we can include this now and I'll write more documentation in general for the scheduler jobs as a separate PR. I don't really have a preference so if you'd like to wait to merge this PR until more documentation is in place I'll leave it for now.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A copy-paste is sufficient with quotes for the code elements.
Can be updated later with more details if that becomes necessary.

I prefer actual docs because it is easier to navigate and read by users. If I point a user to this env file, they will nope-out immediately.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reads those files and generates a default configuration file for the user on demand.

Yes this clearly solve the duplicate docs problem but might render the tracking of what config var changed (added, renamed, delete) harder for the simple user. Example with JupyterHub, I had to jupyterhub --generate-config to have this one single file so I can diff against my current config. Given I am not familiar with their code, I can not see the difference easily simply when browsing the code.

If we take the simple road of simply referencing the config var doc to the proper default.env file, we also fix the duplicate doc problem and we provide a one single file for new users to look up for the possible list of "public" config vars.

This one single file can also be diffed to find out when a new config var has been added, renamed or deleted.

But with only the reference to the default.env, user will have to make an additional lookup to the matching default.env to read the docs, if the var name is unclear.

Both (one single file for easy browsing vs generated docs) have pros and cons. I do not have a strong opinion on either of those approaches.

Copy link
Collaborator

@tlvu tlvu Oct 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking more about this duplicate doc problem, how about

  • in env.local.example we only put example usage but no documentation wording, referring to the matching default.env
  • be very diligent about var name and example to make it self explanatory

Honesly, when reading docs I can infer a lot but just looking at the example usage. I will actually read the full documentation only when my inference from the example do not work as intended.

And I think when updating docs, we tend to update the wording to clarify explanation much more frequently than changing the example. Because when changing the example, we basically change the format of the expected value, that means code change and variable meaning change, which happen much less frequently.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we really have super diligent and lengthy documentation in most cases in env files. Most places where there are more details is typically because of specific gotchas that are important to highlight contextually in that place, or that support examples, both of which are more relevant with the variable (or others around involved with it).

If we add more verbose documentation or explanations to further guide users, then I think those should be only in the README. The few example/comments should be self-explanatory as much as possible in env files. There can also be "see https://..." comments pointing at README to facilitate navigation/information retrieval.

About components / optional-components location, my main concern is that if I specifically add a given optional-components feature, I would look up in the corresponding README to find its details. Not having a direct reference just makes it harder to find things. The fact that this optional feature uses the components scheduler does not matter. The scheduler should only list cross-refs to all compatible job components (wherever they reside), and their specific entries should provide more details as needed. This is already the case for many other optional-components that point at their component requirement/compatibility references this way. Scheduler jobs are the ones doing things differently that all others.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok...

There are a lot of improvements that can be made to the documentation. I think we all agree on that.

However I think that the question of how to organize the README files is separate from how to organize default.env and env.local.example.

For the README files, yes we should keep the info from components and optional components in their respective READMEs. And I am happy to start that by putting the documentation relevant to this PR in the optional components.

A separate PR(s) will have to be made to update the documentation for the other components. For example, the auto deploy info is still in components even though it is nowadays scheduler job in optional components.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. Only focus on the current components edited/added within this PR.


* Automatically backup files

* automatically backs up application data, user data, representative data, and logs to a restic repository (default) or a docker volume
Expand Down
37 changes: 37 additions & 0 deletions birdhouse/env.local.example
Original file line number Diff line number Diff line change
Expand Up @@ -204,6 +204,43 @@ export GEOSERVER_ADMIN_PASSWORD="${__DEFAULT__GEOSERVER_ADMIN_PASSWORD}"
# (note: if using 'BIRDHOUSE_DATA_PERSIST_ROOT', it must be defined earlier, either in this file or from 'default.env')
#export BIRDHOUSE_LOGROTATE_DATA_DIR='${BIRDHOUSE_DATA_PERSIST_ROOT}/logrotate'

# These variables configure the scheduler-job-clean_old_files component
#
# For all options below:
# - variables that end with SCHEDULER_JOB_CLEAN_OLD_FILES_ENABLED will enable the a clean_old_files job for the relevant component if set to 'true'
# - variables that end with DELETE_FILES_OLDER_THAN_DAYS set a number of days. Files older than this number of days will be deleted every time
# the scheduler-job-clean_old_files scheduler job runs.
# - variables that end with DELETE_FILES_TIME_MODE is used by the find command to calculate the age of a file:
# - atime: delete files that haven't been accessed in X days
# - mtime: delete files that haven't been modified in X days
# - ctime: delete files that were created more than X days ago
# - variables that end with SCHEDULER_JOB_CLEAN_OLD_FILES_FREQUENCY set the frequency at which the relevant clean old files job should be run. This
# value is a string that conforms to the cron schedule format.
#
# Delete old WPS output files generated by the finch WPS component
#export SCHEDULER_JOB_CLEAN_OLD_FILES_FINCH_OLDER_THAN_DAYS= # unset by default if this job is enabled this must be set to an integer
#export SCHEDULER_JOB_CLEAN_OLD_FILES_FINCH_TIME_MODE=atime
#export SCHEDULER_JOB_CLEAN_OLD_FILES_FINCH_FREQUENCY="5 4 * * 0" # weekly on Sunday at 4:05
#export SCHEDULER_JOB_CLEAN_OLD_FILES_FINCH_ENABLED=false
#
# Delete old WPS output files generated by the hummingbird WPS component
#export SCHEDULER_JOB_CLEAN_OLD_FILES_HUMMINGBIRD_OLDER_THAN_DAYS= # unset by default if this job is enabled this must be set to an integer
#export SCHEDULER_JOB_CLEAN_OLD_FILES_HUMMINGBIRD_TIME_MODE=atime
#export SCHEDULER_JOB_CLEAN_OLD_FILES_HUMMINGBIRD_FREQUENCY="10 4 * * 0" # weekly on Sunday at 4:10
#export SCHEDULER_JOB_CLEAN_OLD_FILES_HUMMINGBIRD_ENABLED=false
#
# Delete old WPS output files generated by the raven WPS component
#export SCHEDULER_JOB_CLEAN_OLD_FILES_RAVEN_OLDER_THAN_DAYS= # unset by default if this job is enabled this must be set to an integer
#export SCHEDULER_JOB_CLEAN_OLD_FILES_RAVEN_TIME_MODE=atime
#export SCHEDULER_JOB_CLEAN_OLD_FILES_RAVEN_FREQUENCY="10 4 * * 0" # weekly on Sunday at 4:10
#export SCHEDULER_JOB_CLEAN_OLD_FILES_RAVEN_ENABLED=false
#
# Delete old log files generated by the thredds component
#export SCHEDULER_JOB_CLEAN_OLD_FILES_THREDDS_OLDER_THAN_DAYS= # unset by default if this job is enabled this must be set to an integer
#export SCHEDULER_JOB_CLEAN_OLD_FILES_THREDDS_TIME_MODE=mtime
#export SCHEDULER_JOB_CLEAN_OLD_FILES_THREDDS_FREQUENCY="20 4 * * 0" # weekly on Sunday at 4:20
#export SCHEDULER_JOB_CLEAN_OLD_FILES_THREDDS_ENABLED=false

# These variables configure the scheduler-job-backup component

# Schedule for when to run the job to automatically back up data
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
config.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
# this file intentionally contains no content and is mounted to the scheduler directory if a clean_old_files job is not enabled.
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
#!/bin/sh

################################################################
# Example call to delete all files in /tmp last modified longer
# than 20 days ago
#
# $ sh clean-old-files.sh 20 mtime /tmp
##################################################################

AGE="$1"
MODE="$2"
LOCATION="$3"

ACCEPTABLE_MODES='|mtime|ctime|atime|'

if ! echo "$AGE" | grep -q '^[0-9][0-9]*$'; then
>&2 echo "AGE argument set to '${AGE}'. It must be an unsigned integer"
exit 1
fi

if [ "${ACCEPTABLE_MODES#*"|${MODE}|"}" = "${ACCEPTABLE_MODES}" ]; then
>&2 echo "MODE argument set to '${MODE}'. It must be one of 'mtime', 'ctime', or 'atime'"
exit 1
fi

if [ -z "${LOCATION}" ]; then
>&2 echo "LOCATION argument is blank or unset. It must refer to a path on disk."
exit 1
fi

echo "Removing files in ${LOCATION} that have a ${MODE} value greater than ${AGE} days"
find "${LOCATION}" -type f "-${MODE}" +"${AGE}" -print -delete
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
- name: clean_old_files_finch
comment: clean old WPS output files generated by Finch
schedule: '${SCHEDULER_JOB_CLEAN_OLD_FILES_FINCH_FREQUENCY}'
command: 'sh /clean-old-files.sh "${SCHEDULER_JOB_CLEAN_OLD_FILES_FINCH_OLDER_THAN_DAYS}" "${SCHEDULER_JOB_CLEAN_OLD_FILES_FINCH_TIME_MODE}" /wps_outputs/finch'
dockerargs: >-
--rm --name scheduler-job-clean_old_files_finch
--volume ${COMPOSE_DIR}/optional-components/scheduler-job-clean_old_files/clean-old-files.sh:/clean-old-files.sh:ro
--volume "${COMPOSE_PROJECT_NAME}_wps_outputs:/wps_outputs:rw"
image: '${SCHEDULER_JOB_CLEAN_OLD_FILES_IMAGE}'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given it's an unattended cron job, where is this logging to so we can debug if something goes wrong? Can log to BIRDHOUSE_LOG_DIR with already built-in logrotation?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually have a question about that....

Now that we are making more use of the scheduler to run additional jobs, why are we writing additional logs to BIRDHOUSE_LOG_DIR for the scheduler jobs instead of the usual pattern of logging to the stdout/stderr of the scheduler container? Then docker would manage the logs for us (and the logs can easily be backed up using the backup scheduler job).

Copy link
Collaborator

@tlvu tlvu Oct 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logging to the stdout/stderr of the scheduler container?

That is fine for regular service container (1 service, 1 container) but here if all the jobs all log to the same scheduler container we can have interleaving log if jobs runtime happen to overlap. Even without overlap, it will make searching for the desired log harder because all the jobs logs to the same place. With each separate job logging to separate file, we avoid the search for the desired job log problem and the possible overlap problem. The more jobs we have, the more likely the problems will happen.

So I much prefer each job to its own separate log file.

Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
services:
scheduler:
volumes:
- ./optional-components/scheduler-job-clean_old_files/${__SCHEDULER_JOB_CLEAN_OLD_FILES_FINCH_CONFIG_LOC:-blank.config.yml}:/scheduler-job-configs/clean_old_files_finch.yml:ro}
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
- name: clean_old_files_hummingbird
comment: clean old WPS output files generated by Hummingbird
schedule: '${SCHEDULER_JOB_CLEAN_OLD_FILES_HUMMINGBIRD_FREQUENCY}'
command: 'sh /clean-old-files.sh "${SCHEDULER_JOB_CLEAN_OLD_FILES_HUMMINGBIRD_OLDER_THAN_DAYS}" "${SCHEDULER_JOB_CLEAN_OLD_FILES_HUMMINGBIRD_TIME_MODE}" /wps_outputs/hummingbird'
dockerargs: >-
--rm --name scheduler-job-clean_old_files_hummingbird
--volume ${COMPOSE_DIR}/optional-components/scheduler-job-clean_old_files/clean-old-files.sh:/clean-old-files.sh:ro
--volume "${COMPOSE_PROJECT_NAME}_wps_outputs:/wps_outputs:rw"
image: '${SCHEDULER_JOB_CLEAN_OLD_FILES_IMAGE}'
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
services:
scheduler:
volumes:
- ./optional-components/scheduler-job-clean_old_files/${__SCHEDULER_JOB_CLEAN_OLD_FILES_HUMMINGBIRD_CONFIG_LOC:-blank.config.yml}:/scheduler-job-configs/clean_old_files_hummingbird.yml:ro}
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
- name: clean_old_files_raven
comment: clean old WPS output files generated by Raven
schedule: '${SCHEDULER_JOB_CLEAN_OLD_FILES_RAVEN_FREQUENCY}'
command: 'sh /clean-old-files.sh "${SCHEDULER_JOB_CLEAN_OLD_FILES_RAVEN_OLDER_THAN_DAYS}" "${SCHEDULER_JOB_CLEAN_OLD_FILES_RAVEN_TIME_MODE}" /wps_outputs/raven'
dockerargs: >-
--rm --name scheduler-job-clean_old_files_raven
--volume ${COMPOSE_DIR}/optional-components/scheduler-job-clean_old_files/clean-old-files.sh:/clean-old-files.sh:ro
--volume "${COMPOSE_PROJECT_NAME}_wps_outputs:/wps_outputs:rw"
image: '${SCHEDULER_JOB_CLEAN_OLD_FILES_IMAGE}'
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
services:
scheduler:
volumes:
- ./optional-components/scheduler-job-clean_old_files/${__SCHEDULER_JOB_CLEAN_OLD_FILES_RAVEN_CONFIG_LOC:-blank.config.yml}:/scheduler-job-configs/clean_old_files_raven.yml:ro}
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
- name: clean_old_files_thredds
comment: clean old log files generated by Thredds
schedule: '${SCHEDULER_JOB_CLEAN_OLD_FILES_THREDDS_FREQUENCY}'
command: 'sh /clean-old-files.sh "${SCHEDULER_JOB_CLEAN_OLD_FILES_THREDDS_OLDER_THAN_DAYS}" "${SCHEDULER_JOB_CLEAN_OLD_FILES_THREDDS_TIME_MODE}" /thredds'
dockerargs: >-
--rm --name scheduler-job-clean_old_files_thredds
--volume ${COMPOSE_DIR}/optional-components/scheduler-job-clean_old_files/clean-old-files.sh:/clean-old-files.sh:ro
--volume "thredds_persistence:/thredds:rw"
image: '${SCHEDULER_JOB_CLEAN_OLD_FILES_IMAGE}'
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
services:
scheduler:
volumes:
- ./optional-components/scheduler-job-clean_old_files/${__SCHEDULER_JOB_CLEAN_OLD_FILES_THREDDS_CONFIG_LOC:-blank.config.yml}:/scheduler-job-configs/clean_old_files_thredds.yml:ro}
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
export SCHEDULER_JOB_CLEAN_OLD_FILES_DOCKER=alpine # alpine contains find with -ctime -mtime and -atime options (busybox based containers do not)
export SCHEDULER_JOB_CLEAN_OLD_FILES_VERSION=3.21
export SCHEDULER_JOB_CLEAN_OLD_FILES_IMAGE='${SCHEDULER_JOB_CLEAN_OLD_FILES_DOCKER}:${SCHEDULER_JOB_CLEAN_OLD_FILES_VERSION}'

export SCHEDULER_JOB_CLEAN_OLD_FILES_FINCH_OLDER_THAN_DAYS= # unset by default if this job is enabled this must be set to an integer
export SCHEDULER_JOB_CLEAN_OLD_FILES_FINCH_TIME_MODE=atime
export SCHEDULER_JOB_CLEAN_OLD_FILES_FINCH_FREQUENCY="5 4 * * 0" # weekly on Sunday at 4:05
export SCHEDULER_JOB_CLEAN_OLD_FILES_FINCH_ENABLED=false
export __SCHEDULER_JOB_CLEAN_OLD_FILES_FINCH_CONFIG_LOC='$( [ "${SCHEDULER_JOB_CLEAN_OLD_FILES_FINCH_ENABLED}" = "true" ] && echo "config/finch/config.yml" )'

export SCHEDULER_JOB_CLEAN_OLD_FILES_HUMMINGBIRD_OLDER_THAN_DAYS= # unset by default if this job is enabled this must be set to an integer
export SCHEDULER_JOB_CLEAN_OLD_FILES_HUMMINGBIRD_TIME_MODE=atime
export SCHEDULER_JOB_CLEAN_OLD_FILES_HUMMINGBIRD_FREQUENCY="10 4 * * 0" # weekly on Sunday at 4:10
export SCHEDULER_JOB_CLEAN_OLD_FILES_HUMMINGBIRD_ENABLED=false
export __SCHEDULER_JOB_CLEAN_OLD_FILES_HUMMINGBIRD_CONFIG_LOC='$( [ "${SCHEDULER_JOB_CLEAN_OLD_FILES_HUMMINGBIRD_ENABLED}" = "true" ] && echo "config/hummingbird/config.yml" )'

export SCHEDULER_JOB_CLEAN_OLD_FILES_RAVEN_OLDER_THAN_DAYS= # unset by default if this job is enabled this must be set to an integer
export SCHEDULER_JOB_CLEAN_OLD_FILES_RAVEN_TIME_MODE=atime
export SCHEDULER_JOB_CLEAN_OLD_FILES_RAVEN_FREQUENCY="15 4 * * 0" # weekly on Sunday at 4:15
export SCHEDULER_JOB_CLEAN_OLD_FILES_RAVEN_ENABLED=false
export __SCHEDULER_JOB_CLEAN_OLD_FILES_RAVEN_CONFIG_LOC='$( [ "${SCHEDULER_JOB_CLEAN_OLD_FILES_RAVEN_ENABLED}" = "true" ] && echo "config/raven/config.yml" )'

export SCHEDULER_JOB_CLEAN_OLD_FILES_THREDDS_OLDER_THAN_DAYS= # unset by default if this job is enabled this must be set to an integer
export SCHEDULER_JOB_CLEAN_OLD_FILES_THREDDS_TIME_MODE=mtime
export SCHEDULER_JOB_CLEAN_OLD_FILES_THREDDS_FREQUENCY="20 4 * * 0" # weekly on Sunday at 4:20
export SCHEDULER_JOB_CLEAN_OLD_FILES_THREDDS_ENABLED=false
export __SCHEDULER_JOB_CLEAN_OLD_FILES_THREDDS_CONFIG_LOC='$( [ "${SCHEDULER_JOB_CLEAN_OLD_FILES_THREDDS_ENABLED}" = "true" ] && echo "config/thredds/config.yml" )'

export DELAYED_EVAL="
$DELAYED_EVAL
SCHEDULER_JOB_CLEAN_OLD_FILES_IMAGE
__SCHEDULER_JOB_CLEAN_OLD_FILES_FINCH_CONFIG_LOC
__SCHEDULER_JOB_CLEAN_OLD_FILES_HUMMINGBIRD_CONFIG_LOC
__SCHEDULER_JOB_CLEAN_OLD_FILES_RAVEN_CONFIG_LOC
__SCHEDULER_JOB_CLEAN_OLD_FILES_THREDDS_CONFIG_LOC
"

OPTIONAL_VARS="
$OPTIONAL_VARS
\$SCHEDULER_JOB_CLEAN_OLD_FILES_FINCH_OLDER_THAN_DAYS
\$SCHEDULER_JOB_CLEAN_OLD_FILES_FINCH_TIME_MODE
\$SCHEDULER_JOB_CLEAN_OLD_FILES_FINCH_FREQUENCY
\$SCHEDULER_JOB_CLEAN_OLD_FILES_HUMMINGBIRD_OLDER_THAN_DAYS
\$SCHEDULER_JOB_CLEAN_OLD_FILES_HUMMINGBIRD_TIME_MODE
\$SCHEDULER_JOB_CLEAN_OLD_FILES_HUMMINGBIRD_FREQUENCY
\$SCHEDULER_JOB_CLEAN_OLD_FILES_RAVEN_OLDER_THAN_DAYS
\$SCHEDULER_JOB_CLEAN_OLD_FILES_RAVEN_TIME_MODE
\$SCHEDULER_JOB_CLEAN_OLD_FILES_RAVEN_FREQUENCY
\$SCHEDULER_JOB_CLEAN_OLD_FILES_THREDDS_OLDER_THAN_DAYS
\$SCHEDULER_JOB_CLEAN_OLD_FILES_THREDDS_TIME_MODE
\$SCHEDULER_JOB_CLEAN_OLD_FILES_THREDDS_FREQUENCY
"

VARS="
$VARS
\$SCHEDULER_JOB_CLEAN_OLD_FILES_IMAGE
"
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
if ! echo "${BIRDHOUSE_EXTRA_CONF_DIRS}" | grep -q 'scheduler[[:space:]]*$'; then
log WARN 'The scheduler-job-clean_old_files component is enabled but the scheduler component is not. This WILL cause problems. Please disable the scheduler-job-clean_old_files component.'
fi

_acceptable_modes='|mtime|ctime|atime|'

if [ "${SCHEDULER_JOB_CLEAN_OLD_FILES_FINCH_ENABLED}" = "true" ] && ! echo "${BIRDHOUSE_EXTRA_CONF_DIRS}" | grep -q 'finch[[:space:]]*$'; then
echo "$SCHEDULER_JOB_CLEAN_OLD_FILES_FINCH_OLDER_THAN_DAYS" | grep -q '^[0-9][0-9]*$' || log WARN "SCHEDULER_JOB_CLEAN_OLD_FILES_FINCH_OLDER_THAN_DAYS variable must be an integer not '${SCHEDULER_JOB_CLEAN_OLD_FILES_FINCH_OLDER_THAN_DAYS}'. Please set this variable to an integer or disable the finch file cleaning job. This job will not run properly!"
[ "${_acceptable_modes#*"${SCHEDULER_JOB_CLEAN_OLD_FILES_FINCH_TIME_MODE}"}" = "${_acceptable_modes}" ] && log WARN "SCHEDULER_JOB_CLEAN_OLD_FILES_FINCH_TIME_MODE variable must be one of 'mtime', 'atime', or 'ctime' not '${SCHEDULER_JOB_CLEAN_OLD_FILES_FINCH_TIME_MODE}'. Please set this variable to a valid option or disable the finch file cleaning job. This job will not run properly!"
fi
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be refactored to a function like check-clean-old-files-jobs-vars and called like check-clean-old-files-jobs-vars $SCHEDULER_JOB_CLEAN_OLD_FILES_FINCH_ENABLED finch $SCHEDULER_JOB_CLEAN_OLD_FILES_FINCH_OLDER_THAN_DAYS $SCHEDULER_JOB_CLEAN_OLD_FILES_FINCH_TIME_MODE for each of the 4 jobs?


if [ "${SCHEDULER_JOB_CLEAN_OLD_FILES_HUMMINGBIRD_ENABLED}" = "true" ] && ! echo "${BIRDHOUSE_EXTRA_CONF_DIRS}" | grep -q 'hummingbird[[:space:]]*$'; then
echo "$SCHEDULER_JOB_CLEAN_OLD_FILES_HUMMINGBIRD_OLDER_THAN_DAYS" | grep -q '^[0-9][0-9]*$' || log WARN "SCHEDULER_JOB_CLEAN_OLD_FILES_HUMMINGBIRD_OLDER_THAN_DAYS variable must be an integer not '${SCHEDULER_JOB_CLEAN_OLD_FILES_HUMMINGBIRD_OLDER_THAN_DAYS}'. Please set this variable to an integer or disable the hummingbird file cleaning job. This job will not run properly!"
[ "${_acceptable_modes#*"${SCHEDULER_JOB_CLEAN_OLD_FILES_HUMMINGBIRD_TIME_MODE}"}" = "${_acceptable_modes}" ] && log WARN "SCHEDULER_JOB_CLEAN_OLD_FILES_HUMMINGBIRD_TIME_MODE variable must be one of 'mtime', 'atime', or 'ctime' not '${SCHEDULER_JOB_CLEAN_OLD_FILES_HUMMINGBIRD_TIME_MODE}'. Please set this variable to a valid option or disable the hummingbird file cleaning job. This job will not run properly!"
fi

if [ "${SCHEDULER_JOB_CLEAN_OLD_FILES_RAVEN_ENABLED}" = "true" ] && ! echo "${BIRDHOUSE_EXTRA_CONF_DIRS}" | grep -q 'raven[[:space:]]*$'; then
echo "$SCHEDULER_JOB_CLEAN_OLD_FILES_RAVEN_OLDER_THAN_DAYS" | grep -q '^[0-9][0-9]*$' || log WARN "SCHEDULER_JOB_CLEAN_OLD_FILES_RAVEN_OLDER_THAN_DAYS variable must be an integer not '${SCHEDULER_JOB_CLEAN_OLD_FILES_RAVEN_OLDER_THAN_DAYS}'. Please set this variable to an integer or disable the raven file cleaning job. This job will not run properly!"
[ "${_acceptable_modes#*"${SCHEDULER_JOB_CLEAN_OLD_FILES_RAVEN_TIME_MODE}"}" = "${_acceptable_modes}" ] && log WARN "SCHEDULER_JOB_CLEAN_OLD_FILES_RAVEN_TIME_MODE variable must be one of 'mtime', 'atime', or 'ctime' not '${SCHEDULER_JOB_CLEAN_OLD_FILES_RAVEN_TIME_MODE}'. Please set this variable to a valid option or disable the raven file cleaning job. This job will not run properly!"
fi

if [ "${SCHEDULER_JOB_CLEAN_OLD_FILES_THREDDS_ENABLED}" = "true" ] && ! echo "${BIRDHOUSE_EXTRA_CONF_DIRS}" | grep -q 'thredds[[:space:]]*$'; then
echo "$SCHEDULER_JOB_CLEAN_OLD_FILES_THREDDS_OLDER_THAN_DAYS" | grep -q '^[0-9][0-9]*$' || log WARN "SCHEDULER_JOB_CLEAN_OLD_FILES_THREDDS_OLDER_THAN_DAYS variable must be an integer not '${SCHEDULER_JOB_CLEAN_OLD_FILES_THREDDS_OLDER_THAN_DAYS}'. Please set this variable to an integer or disable the thredds file cleaning job. This job will not run properly!"
[ "${_acceptable_modes#*"${SCHEDULER_JOB_CLEAN_OLD_FILES_THREDDS_TIME_MODE}"}" = "${_acceptable_modes}" ] && log WARN "SCHEDULER_JOB_CLEAN_OLD_FILES_THREDDS_TIME_MODE variable must be one of 'mtime', 'atime', or 'ctime' not '${SCHEDULER_JOB_CLEAN_OLD_FILES_THREDDS_TIME_MODE}'. Please set this variable to a valid option or disable the thredds file cleaning job. This job will not run properly!"
fi