-
Notifications
You must be signed in to change notification settings - Fork 7
Introduce a scheduler job to delete old files that may accumulate over time #510
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Changes from 6 commits
d14c99e
8a590a5
354e74c
2d198e8
84b0ef6
796766e
d39494c
1059ffe
6b5483e
fa3e1c2
f0192f4
fe202df
4eff78d
26e0595
7585d55
03d5963
90bd59f
279b841
f36262f
af633e4
2e4dd81
e61d833
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
mishaschwartz marked this conversation as resolved.
Show resolved
Hide resolved
|
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1 @@ | ||
| config.yml |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1 @@ | ||
| # this file intentionally contains no content and is mounted to the scheduler directory if a clean_old_files job is not enabled. |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,32 @@ | ||
| #!/bin/sh | ||
|
|
||
| ################################################################ | ||
| # Example call to delete all files in /tmp last modified longer | ||
| # than 20 days ago | ||
| # | ||
| # $ sh clean-old-files.sh 20 mtime /tmp | ||
| ################################################################## | ||
|
|
||
| AGE="$1" | ||
| MODE="$2" | ||
| LOCATION="$3" | ||
|
|
||
| ACCEPTABLE_MODES='|mtime|ctime|atime|' | ||
|
|
||
| if ! echo "$AGE" | grep -q '^[0-9][0-9]*$'; then | ||
| >&2 echo "AGE argument set to '${AGE}'. It must be an unsigned integer" | ||
| exit 1 | ||
| fi | ||
|
|
||
| if [ "${ACCEPTABLE_MODES#*"|${MODE}|"}" = "${ACCEPTABLE_MODES}" ]; then | ||
| >&2 echo "MODE argument set to '${MODE}'. It must be one of 'mtime', 'ctime', or 'atime'" | ||
| exit 1 | ||
| fi | ||
|
|
||
| if [ -z "${LOCATION}" ]; then | ||
| >&2 echo "LOCATION argument is blank or unset. It must refer to a path on disk." | ||
| exit 1 | ||
| fi | ||
|
|
||
| echo "Removing files in ${LOCATION} that have a ${MODE} value greater than ${AGE} days" | ||
| find "${LOCATION}" -type f "-${MODE}" +"${AGE}" -print -delete |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,9 @@ | ||
| - name: clean_old_files_finch | ||
| comment: clean old WPS output files generated by Finch | ||
| schedule: '${FINCH_SCHEDULER_JOB_CLEAN_OLD_FILES_FREQUENCY}' | ||
| command: 'sh /clean-old-files.sh "${FINCH_WPS_OUTPUTS_DELETE_FILES_OLDER_THAN_DAYS}" "${FINCH_WPS_OUTPUTS_DELETE_FILES_TIME_MODE}" /wps_outputs/finch' | ||
| dockerargs: >- | ||
| --rm --name scheduler-job-clean_old_files_finch | ||
| --volume ${COMPOSE_DIR}/optional-components/scheduler-job-clean_old_files/clean-old-files.sh:/clean-old-files.sh:ro | ||
| --volume "${COMPOSE_PROJECT_NAME}_wps_outputs:/wps_outputs:rw" | ||
| image: '${SCHEDULER_JOB_CLEAN_OLD_FILES_IMAGE}' | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Given it's an unattended cron job, where is this logging to so we can debug if something goes wrong? Can log to
Collaborator
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I actually have a question about that.... Now that we are making more use of the scheduler to run additional jobs, why are we writing additional logs to
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
That is fine for regular service container (1 service, 1 container) but here if all the jobs all log to the same scheduler container we can have interleaving log if jobs runtime happen to overlap. Even without overlap, it will make searching for the desired log harder because all the jobs logs to the same place. With each separate job logging to separate file, we avoid the search for the desired job log problem and the possible overlap problem. The more jobs we have, the more likely the problems will happen. So I much prefer each job to its own separate log file. |
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,4 @@ | ||
| services: | ||
| scheduler: | ||
| volumes: | ||
| - ./optional-components/scheduler-job-clean_old_files/${__SCHEDULER_JOB_CLEAN_OLD_FILES_FINCH_CONFIG_LOC:-blank.config.yml}:/scheduler-job-configs/clean_old_files_finch.yml:ro} |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,9 @@ | ||
| - name: clean_old_files_hummingbird | ||
| comment: clean old WPS output files generated by Hummingbird | ||
| schedule: '${HUMMINGBIRD_SCHEDULER_JOB_CLEAN_OLD_FILES_FREQUENCY}' | ||
| command: 'sh /clean-old-files.sh "${HUMMINGBIRD_WPS_OUTPUTS_DELETE_FILES_OLDER_THAN_DAYS}" "${HUMMINGBIRD_WPS_OUTPUTS_DELETE_FILES_TIME_MODE}" /wps_outputs/hummingbird' | ||
| dockerargs: >- | ||
| --rm --name scheduler-job-clean_old_files_hummingbird | ||
| --volume ${COMPOSE_DIR}/optional-components/scheduler-job-clean_old_files/clean-old-files.sh:/clean-old-files.sh:ro | ||
| --volume "${COMPOSE_PROJECT_NAME}_wps_outputs:/wps_outputs:rw" | ||
| image: '${SCHEDULER_JOB_CLEAN_OLD_FILES_IMAGE}' |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,4 @@ | ||
| services: | ||
| scheduler: | ||
| volumes: | ||
| - ./optional-components/scheduler-job-clean_old_files/${__SCHEDULER_JOB_CLEAN_OLD_FILES_HUMMINGBIRD_CONFIG_LOC:-blank.config.yml}:/scheduler-job-configs/clean_old_files_hummingbird.yml:ro} |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,9 @@ | ||
| - name: clean_old_files_raven | ||
| comment: clean old WPS output files generated by Raven | ||
| schedule: '${RAVEN_SCHEDULER_JOB_CLEAN_OLD_FILES_FREQUENCY}' | ||
| command: 'sh /clean-old-files.sh "${RAVEN_WPS_OUTPUTS_DELETE_FILES_OLDER_THAN_DAYS}" "${RAVEN_WPS_OUTPUTS_DELETE_FILES_TIME_MODE}" /wps_outputs/raven' | ||
| dockerargs: >- | ||
| --rm --name scheduler-job-clean_old_files_raven | ||
| --volume ${COMPOSE_DIR}/optional-components/scheduler-job-clean_old_files/clean-old-files.sh:/clean-old-files.sh:ro | ||
| --volume "${COMPOSE_PROJECT_NAME}_wps_outputs:/wps_outputs:rw" | ||
| image: '${SCHEDULER_JOB_CLEAN_OLD_FILES_IMAGE}' |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,4 @@ | ||
| services: | ||
| scheduler: | ||
| volumes: | ||
| - ./optional-components/scheduler-job-clean_old_files/${__SCHEDULER_JOB_CLEAN_OLD_FILES_RAVEN_CONFIG_LOC:-blank.config.yml}:/scheduler-job-configs/clean_old_files_raven.yml:ro} |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,9 @@ | ||
| - name: clean_old_files_thredds | ||
| comment: clean old log files generated by Thredds | ||
| schedule: '${THREDDS_SCHEDULER_JOB_CLEAN_OLD_FILES_FREQUENCY}' | ||
| command: 'sh /clean-old-files.sh "${THREDDS_LOGS_DELETE_FILES_OLDER_THAN_DAYS}" "${THREDDS_LOGS_DELETE_FILES_TIME_MODE}" /thredds' | ||
| dockerargs: >- | ||
| --rm --name scheduler-job-clean_old_files_thredds | ||
| --volume ${COMPOSE_DIR}/optional-components/scheduler-job-clean_old_files/clean-old-files.sh:/clean-old-files.sh:ro | ||
| --volume "thredds_persistence:/thredds:rw" | ||
| image: '${SCHEDULER_JOB_CLEAN_OLD_FILES_IMAGE}' |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,4 @@ | ||
| services: | ||
| scheduler: | ||
| volumes: | ||
| - ./optional-components/scheduler-job-clean_old_files/${__SCHEDULER_JOB_CLEAN_OLD_FILES_THREDDS_CONFIG_LOC:-blank.config.yml}:/scheduler-job-configs/clean_old_files_thredds.yml:ro} |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,59 @@ | ||
| export SCHEDULER_JOB_CLEAN_OLD_FILES_DOCKER=alpine # alpine contains find with -ctime -mtime and -atime options (busybox based containers do not) | ||
| export SCHEDULER_JOB_CLEAN_OLD_FILES_VERSION=3.21 | ||
| export SCHEDULER_JOB_CLEAN_OLD_FILES_IMAGE='${SCHEDULER_JOB_CLEAN_OLD_FILES_DOCKER}:${SCHEDULER_JOB_CLEAN_OLD_FILES_VERSION}' | ||
|
|
||
| if echo "${BIRDHOUSE_EXTRA_CONF_DIRS}" | grep -qv 'scheduler[[:space:]]*$'; then | ||
| export FINCH_WPS_OUTPUTS_DELETE_FILES_OLDER_THAN_DAYS= # unset by default if this job is enabled this must be set to an integer | ||
| export FINCH_WPS_OUTPUTS_DELETE_FILES_TIME_MODE=atime | ||
| export FINCH_SCHEDULER_JOB_CLEAN_OLD_FILES_FREQUENCY="5 4 * * 0" # weekly on Sunday at 4:05 | ||
| export FINCH_SCHEDULER_JOB_CLEAN_OLD_FILES_ENABLED=false | ||
| export __SCHEDULER_JOB_CLEAN_OLD_FILES_FINCH_CONFIG_LOC='$( [ "${FINCH_SCHEDULER_JOB_CLEAN_OLD_FILES_ENABLED}" = "true" ] && echo "config/finch/config.yml" )' | ||
|
|
||
| export HUMMINGBIRD_WPS_OUTPUTS_DELETE_FILES_OLDER_THAN_DAYS= # unset by default if this job is enabled this must be set to an integer | ||
| export HUMMINGBIRD_WPS_OUTPUTS_DELETE_FILES_TIME_MODE=atime | ||
| export HUMMINGBIRD_SCHEDULER_JOB_CLEAN_OLD_FILES_FREQUENCY="10 4 * * 0" # weekly on Sunday at 4:10 | ||
| export HUMMINGBIRD_SCHEDULER_JOB_CLEAN_OLD_FILES_ENABLED=false | ||
| export __SCHEDULER_JOB_CLEAN_OLD_FILES_HUMMINGBIRD_CONFIG_LOC='$( [ "${HUMMINGBIRD_SCHEDULER_JOB_CLEAN_OLD_FILES_ENABLED}" = "true" ] && echo "config/hummingbird/config.yml" )' | ||
|
|
||
| export RAVEN_WPS_OUTPUTS_DELETE_FILES_OLDER_THAN_DAYS= # unset by default if this job is enabled this must be set to an integer | ||
| export RAVEN_WPS_OUTPUTS_DELETE_FILES_TIME_MODE=atime | ||
| export RAVEN_SCHEDULER_JOB_CLEAN_OLD_FILES_FREQUENCY="15 4 * * 0" # weekly on Sunday at 4:15 | ||
| export RAVEN_SCHEDULER_JOB_CLEAN_OLD_FILES_ENABLED=false | ||
| export __SCHEDULER_JOB_CLEAN_OLD_FILES_RAVEN_CONFIG_LOC='$( [ "${RAVEN_SCHEDULER_JOB_CLEAN_OLD_FILES_ENABLED}" = "true" ] && echo "config/raven/config.yml" )' | ||
|
|
||
| export THREDDS_LOGS_DELETE_FILES_OLDER_THAN_DAYS= # unset by default if this job is enabled this must be set to an integer | ||
| export THREDDS_LOGS_DELETE_FILES_TIME_MODE=mtime | ||
| export THREDDS_SCHEDULER_JOB_CLEAN_OLD_FILES_FREQUENCY="20 4 * * 0" # weekly on Sunday at 4:20 | ||
| export THREDDS_SCHEDULER_JOB_CLEAN_OLD_FILES_ENABLED=false | ||
| export __SCHEDULER_JOB_CLEAN_OLD_FILES_THREDDS_CONFIG_LOC='$( [ "${THREDDS_SCHEDULER_JOB_CLEAN_OLD_FILES_ENABLED}" = "true" ] && echo "config/thredds/config.yml" )' | ||
| fi | ||
|
|
||
| export DELAYED_EVAL=" | ||
| $DELAYED_EVAL | ||
| SCHEDULER_JOB_CLEAN_OLD_FILES_IMAGE | ||
| __SCHEDULER_JOB_CLEAN_OLD_FILES_FINCH_CONFIG_LOC | ||
| __SCHEDULER_JOB_CLEAN_OLD_FILES_HUMMINGBIRD_CONFIG_LOC | ||
| __SCHEDULER_JOB_CLEAN_OLD_FILES_RAVEN_CONFIG_LOC | ||
| __SCHEDULER_JOB_CLEAN_OLD_FILES_THREDDS_CONFIG_LOC | ||
| " | ||
|
|
||
| OPTIONAL_VARS=" | ||
| $OPTIONAL_VARS | ||
| \$FINCH_WPS_OUTPUTS_DELETE_FILES_OLDER_THAN_DAYS | ||
| \$FINCH_WPS_OUTPUTS_DELETE_FILES_TIME_MODE | ||
| \$FINCH_SCHEDULER_JOB_CLEAN_OLD_FILES_FREQUENCY | ||
| \$HUMMINGBIRD_WPS_OUTPUTS_DELETE_FILES_OLDER_THAN_DAYS | ||
| \$HUMMINGBIRD_WPS_OUTPUTS_DELETE_FILES_TIME_MODE | ||
| \$HUMMINGBIRD_SCHEDULER_JOB_CLEAN_OLD_FILES_FREQUENCY | ||
| \$RAVEN_WPS_OUTPUTS_DELETE_FILES_OLDER_THAN_DAYS | ||
| \$RAVEN_WPS_OUTPUTS_DELETE_FILES_TIME_MODE | ||
| \$RAVEN_SCHEDULER_JOB_CLEAN_OLD_FILES_FREQUENCY | ||
| \$THREDDS_LOGS_DELETE_FILES_OLDER_THAN_DAYS | ||
| \$THREDDS_LOGS_DELETE_FILES_TIME_MODE | ||
| \$THREDDS_SCHEDULER_JOB_CLEAN_OLD_FILES_FREQUENCY | ||
| " | ||
|
|
||
| VARS=" | ||
| $VARS | ||
| \$SCHEDULER_JOB_CLEAN_OLD_FILES_IMAGE | ||
| " |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,23 @@ | ||
| if echo "${BIRDHOUSE_EXTRA_CONF_DIRS}" | grep -qv 'scheduler[[:space:]]*$'; then | ||
| _acceptable_modes='|mtime|ctime|atime|' | ||
|
|
||
| if [ "${FINCH_SCHEDULER_JOB_CLEAN_OLD_FILES_ENABLED}" = "true" ] && echo "${BIRDHOUSE_EXTRA_CONF_DIRS}" | grep -qv 'finch[[:space:]]*$'; then | ||
| echo "$FINCH_WPS_OUTPUTS_DELETE_FILES_OLDER_THAN_DAYS" | grep -q '^[0-9][0-9]*$' || log WARN "FINCH_WPS_OUTPUTS_DELETE_FILES_OLDER_THAN_DAYS variable must be an integer not '${FINCH_WPS_OUTPUTS_DELETE_FILES_OLDER_THAN_DAYS}'. Please set this variable to an integer or disable the finch file cleaning job. This job will not run properly!" | ||
| [ "${_acceptable_modes#*"|${FINCH_WPS_OUTPUTS_DELETE_FILES_TIME_MODE}|"}" = "${_acceptable_modes}" ] && log WARN "FINCH_WPS_OUTPUTS_DELETE_FILES_TIME_MODE variable must be one of 'mtime', 'atime', or 'ctime' not '${FINCH_WPS_OUTPUTS_DELETE_FILES_TIME_MODE}'. Please set this variable to a valid option or disable the finch file cleaning job. This job will not run properly!" | ||
| fi | ||
|
|
||
| if [ "${HUMMINGBIRD_SCHEDULER_JOB_CLEAN_OLD_FILES_ENABLED}" = "true" ] && echo "${BIRDHOUSE_EXTRA_CONF_DIRS}" | grep -qv 'hummingbird[[:space:]]*$'; then | ||
| echo "$HUMMINGBIRD_WPS_OUTPUTS_DELETE_FILES_OLDER_THAN_DAYS" | grep -q '^[0-9][0-9]*$' || log WARN "HUMMINGBIRD_WPS_OUTPUTS_DELETE_FILES_OLDER_THAN_DAYS variable must be an integer not '${HUMMINGBIRD_WPS_OUTPUTS_DELETE_FILES_OLDER_THAN_DAYS}'. Please set this variable to an integer or disable the hummingbird file cleaning job. This job will not run properly!" | ||
| [ "${_acceptable_modes#*"|${HUMMINGBIRD_WPS_OUTPUTS_DELETE_FILES_TIME_MODE}|"}" = "${_acceptable_modes}" ] && log WARN "HUMMINGBIRD_WPS_OUTPUTS_DELETE_FILES_TIME_MODE variable must be one of 'mtime', 'atime', or 'ctime' not '${HUMMINGBIRD_WPS_OUTPUTS_DELETE_FILES_TIME_MODE}'. Please set this variable to a valid option or disable the hummingbird file cleaning job. This job will not run properly!" | ||
| fi | ||
|
|
||
| if [ "${RAVEN_SCHEDULER_JOB_CLEAN_OLD_FILES_ENABLED}" = "true" ] && echo "${BIRDHOUSE_EXTRA_CONF_DIRS}" | grep -qv 'raven[[:space:]]*$'; then | ||
| echo "$RAVEN_WPS_OUTPUTS_DELETE_FILES_OLDER_THAN_DAYS" | grep -q '^[0-9][0-9]*$' || log WARN "RAVEN_WPS_OUTPUTS_DELETE_FILES_OLDER_THAN_DAYS variable must be an integer not '${RAVEN_WPS_OUTPUTS_DELETE_FILES_OLDER_THAN_DAYS}'. Please set this variable to an integer or disable the raven file cleaning job. This job will not run properly!" | ||
| [ "${_acceptable_modes#*"|${RAVEN_WPS_OUTPUTS_DELETE_FILES_TIME_MODE}|"}" = "${_acceptable_modes}" ] && log WARN "RAVEN_WPS_OUTPUTS_DELETE_FILES_TIME_MODE variable must be one of 'mtime', 'atime', or 'ctime' not '${RAVEN_WPS_OUTPUTS_DELETE_FILES_TIME_MODE}'. Please set this variable to a valid option or disable the raven file cleaning job. This job will not run properly!" | ||
| fi | ||
|
|
||
| if [ "${THREDDS_SCHEDULER_JOB_CLEAN_OLD_FILES_ENABLED}" = "true" ] && echo "${BIRDHOUSE_EXTRA_CONF_DIRS}" | grep -qv 'thredds[[:space:]]*$'; then | ||
| echo "$THREDDS_LOGS_DELETE_FILES_OLDER_THAN_DAYS" | grep -q '^[0-9][0-9]*$' || log WARN "THREDDS_LOGS_DELETE_FILES_OLDER_THAN_DAYS variable must be an integer not '${THREDDS_LOGS_DELETE_FILES_OLDER_THAN_DAYS}'. Please set this variable to an integer or disable the thredds file cleaning job. This job will not run properly!" | ||
| [ "${_acceptable_modes#*"|${THREDDS_LOGS_DELETE_FILES_TIME_MODE}|"}" = "${_acceptable_modes}" ] && log WARN "THREDDS_LOGS_DELETE_FILES_TIME_MODE variable must be one of 'mtime', 'atime', or 'ctime' not '${THREDDS_LOGS_DELETE_FILES_TIME_MODE}'. Please set this variable to a valid option or disable the thredds file cleaning job. This job will not run properly!" | ||
| fi | ||
| fi |
Uh oh!
There was an error while loading. Please reload this page.