Skip to content
Open
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
22 commits
Select commit Hold shift + click to select a range
d14c99e
delete old files scheduler job
mishaschwartz Mar 18, 2025
8a590a5
Merge branch 'configurable-crontab' into delete-old-files
mishaschwartz Mar 20, 2025
354e74c
make every job a separate file
mishaschwartz Mar 20, 2025
2d198e8
update docs, fix alpine version, fix command executable
mishaschwartz Mar 20, 2025
84b0ef6
cleanup old code and comments
mishaschwartz Mar 20, 2025
796766e
update CHANGES
mishaschwartz Mar 20, 2025
d39494c
Merge branch 'configurable-crontab' into delete-old-files
mishaschwartz Mar 25, 2025
1059ffe
reconfigure warnings if dependent components are not enabled
mishaschwartz Mar 26, 2025
6b5483e
Merge branch 'configurable-crontab' into delete-old-files
mishaschwartz Mar 26, 2025
fa3e1c2
Merge branch 'configurable-crontab' into delete-old-files
mishaschwartz Mar 27, 2025
f0192f4
Merge branch 'configurable-crontab' into delete-old-files
mishaschwartz Mar 31, 2025
fe202df
Merge branch 'configurable-crontab' into delete-old-files
mishaschwartz Apr 4, 2025
4eff78d
Merge branch 'configurable-crontab' into delete-old-files
mishaschwartz May 1, 2025
26e0595
Merge branch 'configurable-crontab' into delete-old-files
mishaschwartz May 2, 2025
7585d55
Merge branch configurable-crontab' into delete-old-files
mishaschwartz May 8, 2025
03d5963
Merge branch 'configurable-crontab' into delete-old-files
mishaschwartz May 13, 2025
90bd59f
Merge branch 'master' into delete-old-files
mishaschwartz May 27, 2025
279b841
Merge branch 'master' into delete-old-files
mishaschwartz Aug 27, 2025
f36262f
review comments
mishaschwartz Sep 12, 2025
af633e4
Merge branch 'master' into delete-old-files
mishaschwartz Sep 12, 2025
2e4dd81
Merge branch 'master' into delete-old-files
mishaschwartz Sep 26, 2025
e61d833
Merge branch 'master' into delete-old-files
mishaschwartz Jan 27, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions CHANGES.md
Original file line number Diff line number Diff line change
Expand Up @@ -60,6 +60,13 @@
- Answer: This is a hack that would work based on the specific way that the docker-crontab image sets schedules.
However, this is not obvious to the user and is unreliable since it is not documented.

- Introduce a scheduler job to delete old files that may accumulate over time

Creates the `optional-component-clean_old_files` job that deletes old THREDDS log files and WPS output files.
To set the oldest file that will be kept for each of these options, set the `THREDDS_DELETE_FILES_OLDER_THAN_DAYS`
and/or the `WPS_OUTPUTS_DELETE_FILES_OLDER_THAN_DAYS` variables in the local environment files (see
`env.local.example` or the `scheduler` documentation for details).

[2.10.1](https://github.com/bird-house/birdhouse-deploy/tree/2.10.1) (2025-03-10)
------------------------------------------------------------------------------------------------------------------

Expand Down
15 changes: 15 additions & 0 deletions birdhouse/components/README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -59,6 +59,21 @@ component directory to the ``BIRDHOUSE_EXTRA_CONF_DIRS`` variable in your local

* component location: ``optional-components/scheduler-job-deploy_raven_testdata``

* Automatically remove old files

* Removes THREDDS log files and WPS output files older than a specific number of days

* In order to remove THREDDS log files the ``thredds`` component needs to be enabled.

* Set the ``THREDDS_DELETE_FILES_OLDER_THAN_DAYS`` variable in the local environment file to an integer specifying
how old a THREDDS log file needs to be before it is deleted.

* In order to remove WPS output files, at least one of the WPS components needs to be enabled.

* Set the ``WPS_OUTPUTS_DELETE_FILES_OLDER_THAN_DAYS`` variable in the local environment file to an integer specifying
how old a WPS output file needs to be before it is deleted.


For additional configuration options for all these jobs see the ``env.local.example`` file
as well as the individual ``default.env`` files in each of the component directories.

Expand Down
6 changes: 6 additions & 0 deletions birdhouse/components/thredds/default.env
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,12 @@ export THREDDS_DATASET_DATASETSCAN_BODY='
</filter>
'

export THREDDS_DELETE_FILES_OLDER_THAN_DAYS=
export SCHEDULER_JOB_CLEAN_OLD_FILES_OPTIONS="
${SCHEDULER_JOB_CLEAN_OLD_FILES_OPTIONS}
\$([ -n \"\${THREDDS_DELETE_FILES_OLDER_THAN_DAYS}\" ] && echo \"thredds_persistence:/thredds|/thredds/logs/threddsServlet.*.log|\${THREDDS_DELETE_FILES_OLDER_THAN_DAYS}\")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we stay away from appending to variable to avoid escaping and potentially duplicate problem.

I much like the dropping a config file approach. It's more visible to debug too, than having to track the content of a variable.

How about the script clean-old-files.sh read /some-dir/*.conf and iterate over them?

Sample .conf format:

LOCATION=/path/to/dir
AGE=num

... and we can easily add more config var later and we avoid parsing problem.

Each component just drop a .conf file at the location where clean-old-files.sh expects.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is that we're not mounting config files to a container that is run by birdhouse-deploy when it starts up. The scheduler component runs the container so all the additional configurations need to be specified in the scheduler-job-clean_old_files/config.yml file.

No matter what we do, we need to generate that file dynamically using our template mechanism. In order to do that, we need to add additional information in an environment variable.

We could create configuration files and mount them to the container, but we'd still be appending to a variable in order to tell the scheduler which configuration files to mount. Going through a configuration file just adds extra code/overhead and doesn't buy us much.

If you want to debug this easily I'd recommend:

bin/birdhouse configs -c 'echo $SCHEDULER_JOB_CLEAN_OLD_FILES_OPTIONS'

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just thinking about this again ...

I guess we could have a separate scheduler job for each cleanup option ... that would get a little strange because of the way that component dependencies work but maybe I can introduce something here that would make resolving those dependencies a bit easier. Let me think about this a bit

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok I've managed an implementation where individual cleanup jobs are their own scheduler job. @tlvu please check it out when you have a minute.

"

# add any new variables not already in 'VARS' or 'OPTIONAL_VARS' that must be replaced in templates here
VARS="
$VARS
Expand Down
6 changes: 6 additions & 0 deletions birdhouse/components/wps_outputs-volume/default.env
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,12 @@ OPTIONAL_VARS="
\$SECURE_DATA_PROXY_AUTH_INCLUDE
"

export WPS_OUTPUTS_DELETE_FILES_OLDER_THAN_DAYS=
export SCHEDULER_JOB_CLEAN_OLD_FILES_OPTIONS="
${SCHEDULER_JOB_CLEAN_OLD_FILES_OPTIONS}
\$([ -n \"\${WPS_OUTPUTS_DELETE_FILES_OLDER_THAN_DAYS}\" ] && echo \"\${COMPOSE_PROJECT_NAME:-birdhouse}_wps_outputs:/wps_outputs|/wps_outputs|\${WPS_OUTPUTS_DELETE_FILES_OLDER_THAN_DAYS}\")
"

# add any new variables not already in 'VARS' or 'OPTIONAL_VARS' that must be replaced in templates here
# single quotes are important in below list to keep variable names intact until 'birdhouse-compose' parses them
EXTRA_VARS='
Expand Down
16 changes: 16 additions & 0 deletions birdhouse/env.local.example
Original file line number Diff line number Diff line change
Expand Up @@ -204,6 +204,22 @@ export GEOSERVER_ADMIN_PASSWORD="${__DEFAULT__GEOSERVER_ADMIN_PASSWORD}"
# (note: if using 'BIRDHOUSE_DATA_PERSIST_ROOT', it must be defined earlier, either in this file or from 'default.env')
#export BIRDHOUSE_LOGROTATE_DATA_DIR='${BIRDHOUSE_DATA_PERSIST_ROOT}/logrotate'

# These variables configure the scheduler-job-clean_old_files component
#
# Delete THREDDS log files older than X days (e.g. X=20):
#export THREDDS_DELETE_FILES_OLDER_THAN_DAYS=20
#
# Delete WPS output files older than X days (e.g. X=90):
#export WPS_OUTPUTS_DELETE_FILES_OLDER_THAN_DAYS=90
#
# Set cron schedule for the clean old files job (how often the job runs).
# By default it runs weekly on Sunday at 2:05 am:
#export SCHEDULER_JOB_CLEAN_OLD_FILES_FREQUENCY="5 2 * * 0"

#############################################################
# Proxy variables
#############################################################

# Content of "location /" in file config/proxy/conf.d/all-services.include.template
# Useful to have a custom homepage.
# Default:
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
config.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
#!/bin/sh

################################################################
# Deletes old files as determined by the CLEAN_OLD_FILES_OPTIONS
# environment variable.
#
# This variable contains space delimited fields, each
# representing a group of files to be deleted.
# The format of these fields are as follows:
#
# <docker-volume-mounts>|<find-location>|<age-in-days>
#
# - docker-volume-mounts is not used by this script
# - find-location is an argument passed to `find` which will
# recursively search for files to delete based on that argument
# - age-in-days is an integer that represents a number of days,
# all files found by `find` that were modified more than this
# number of days ago will be deleted
#
# Example call to delete all files in /tmp older than 20 days and
# all files in /var/log older than 90 days:
#
# $ export CLEAN_OLD_FILES_OPTIONS='xxx|/tmp|20 yyy|/var/log|90'
# $ sh clean-old-files.sh
##################################################################


for opt in ${CLEAN_OLD_FILES_OPTIONS}; do
loc="$(echo $opt | cut -d\| -f 2)"
age="$(echo $opt | cut -d\| -f 3)"
echo "Removing files in ${loc} that have not been modified in ${age} days"
[ -n "$loc" ] && [ -n "$age" ] && find ${loc} -type f -mtime +"${age}" -print -delete
done
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
- name: clean_old_files
comment: clean old files generated by the stack
schedule: '${SCHEDULER_JOB_CLEAN_OLD_FILES_FREQUENCY}'
command: 'bash /clean-old-files.sh'
dockerargs: >-
--rm --name scheduler-job-clean_old_files
--volume ${COMPOSE_DIR}/optional-components/scheduler-job-clean_old_files/clean-old-files.sh:/clean-old-files.sh:ro
--env CLEAN_OLD_FILES_OPTIONS="${SCHEDULER_JOB_CLEAN_OLD_FILES_OPTIONS}" ${SCHEDULER_JOB_CLEAN_OLD_FILES_VOLUMES}
image: '${SCHEDULER_JOB_CLEAN_OLD_FILES_IMAGE}'
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
services:
scheduler:
volumes:
- ./optional-components/scheduler-job-clean_old_files/config.yml:/scheduler-job-configs/clean_old_files.yml:ro
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
export SCHEDULER_JOB_CLEAN_OLD_FILES_IMAGE='${BASH_IMAGE}'

export SCHEDULER_JOB_CLEAN_OLD_FILES_FREQUENCY="5 2 * * 0" # weekly on Sunday at 2:05

export SCHEDULER_JOB_CLEAN_OLD_FILES_VOLUMES='$(for opt in ${SCHEDULER_JOB_CLEAN_OLD_FILES_OPTIONS}; do printf " --volume %s:rw " "$(echo $opt | cut -d\| -f 1)"; done)'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh ! Additional volume mount as well ! I have the same problem with adapting deploy-data job. I am testing my change, once working, I'll send a PR.


export DELAYED_EVAL="
$DELAYED_EVAL
SCHEDULER_JOB_CLEAN_OLD_FILES_IMAGE
SCHEDULER_JOB_CLEAN_OLD_FILES_OPTIONS
SCHEDULER_JOB_CLEAN_OLD_FILES_VOLUMES
"

OPTIONAL_VARS="
$OPTIONAL_VARS
\$SCHEDULER_JOB_CLEAN_OLD_FILES_OPTIONS
\$SCHEDULER_JOB_CLEAN_OLD_FILES_VOLUMES
"

VARS="
$VARS
\$SCHEDULER_JOB_CLEAN_OLD_FILES_IMAGE
\$SCHEDULER_JOB_CLEAN_OLD_FILES_FREQUENCY
"
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
if [ -z "$SCHEDULER_JOB_CLEAN_OLD_FILES_OPTIONS" ]; then
log WARN 'The scheduler-job-clean_old_files component is enabled but no files are scheduled to be deleted. Please reconfigure this component or disable it.'
fi