-
Notifications
You must be signed in to change notification settings - Fork 7
Introduce a scheduler job to delete old files that may accumulate over time #510
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Changes from all commits
d14c99e
8a590a5
354e74c
2d198e8
84b0ef6
796766e
d39494c
1059ffe
6b5483e
fa3e1c2
f0192f4
fe202df
4eff78d
26e0595
7585d55
03d5963
90bd59f
279b841
f36262f
af633e4
2e4dd81
e61d833
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
mishaschwartz marked this conversation as resolved.
Show resolved
Hide resolved
|
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1 @@ | ||
| config.yml |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1 @@ | ||
| # this file intentionally contains no content and is mounted to the scheduler directory if a clean_old_files job is not enabled. |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,32 @@ | ||
| #!/bin/sh | ||
|
|
||
| ################################################################ | ||
| # Example call to delete all files in /tmp last modified longer | ||
| # than 20 days ago | ||
| # | ||
| # $ sh clean-old-files.sh 20 mtime /tmp | ||
| ################################################################## | ||
|
|
||
| AGE="$1" | ||
| MODE="$2" | ||
| LOCATION="$3" | ||
|
|
||
| ACCEPTABLE_MODES='|mtime|ctime|atime|' | ||
|
|
||
| if ! echo "$AGE" | grep -q '^[0-9][0-9]*$'; then | ||
| >&2 echo "AGE argument set to '${AGE}'. It must be an unsigned integer" | ||
| exit 1 | ||
| fi | ||
|
|
||
| if [ "${ACCEPTABLE_MODES#*"|${MODE}|"}" = "${ACCEPTABLE_MODES}" ]; then | ||
| >&2 echo "MODE argument set to '${MODE}'. It must be one of 'mtime', 'ctime', or 'atime'" | ||
| exit 1 | ||
| fi | ||
|
|
||
| if [ -z "${LOCATION}" ]; then | ||
| >&2 echo "LOCATION argument is blank or unset. It must refer to a path on disk." | ||
| exit 1 | ||
| fi | ||
|
|
||
| echo "Removing files in ${LOCATION} that have a ${MODE} value greater than ${AGE} days" | ||
| find "${LOCATION}" -type f "-${MODE}" +"${AGE}" -print -delete |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,9 @@ | ||
| - name: clean_old_files_finch | ||
| comment: clean old WPS output files generated by Finch | ||
| schedule: '${SCHEDULER_JOB_CLEAN_OLD_FILES_FINCH_FREQUENCY}' | ||
| command: 'sh /clean-old-files.sh "${SCHEDULER_JOB_CLEAN_OLD_FILES_FINCH_OLDER_THAN_DAYS}" "${SCHEDULER_JOB_CLEAN_OLD_FILES_FINCH_TIME_MODE}" /wps_outputs/finch' | ||
| dockerargs: >- | ||
| --rm --name scheduler-job-clean_old_files_finch | ||
| --volume ${COMPOSE_DIR}/optional-components/scheduler-job-clean_old_files/clean-old-files.sh:/clean-old-files.sh:ro | ||
| --volume "${COMPOSE_PROJECT_NAME}_wps_outputs:/wps_outputs:rw" | ||
| image: '${SCHEDULER_JOB_CLEAN_OLD_FILES_IMAGE}' | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Given it's an unattended cron job, where is this logging to so we can debug if something goes wrong? Can log to
Collaborator
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I actually have a question about that.... Now that we are making more use of the scheduler to run additional jobs, why are we writing additional logs to
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
That is fine for regular service container (1 service, 1 container) but here if all the jobs all log to the same scheduler container we can have interleaving log if jobs runtime happen to overlap. Even without overlap, it will make searching for the desired log harder because all the jobs logs to the same place. With each separate job logging to separate file, we avoid the search for the desired job log problem and the possible overlap problem. The more jobs we have, the more likely the problems will happen. So I much prefer each job to its own separate log file. |
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,4 @@ | ||
| services: | ||
| scheduler: | ||
| volumes: | ||
| - ./optional-components/scheduler-job-clean_old_files/${__SCHEDULER_JOB_CLEAN_OLD_FILES_FINCH_CONFIG_LOC:-blank.config.yml}:/scheduler-job-configs/clean_old_files_finch.yml:ro} |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,9 @@ | ||
| - name: clean_old_files_hummingbird | ||
| comment: clean old WPS output files generated by Hummingbird | ||
| schedule: '${SCHEDULER_JOB_CLEAN_OLD_FILES_HUMMINGBIRD_FREQUENCY}' | ||
| command: 'sh /clean-old-files.sh "${SCHEDULER_JOB_CLEAN_OLD_FILES_HUMMINGBIRD_OLDER_THAN_DAYS}" "${SCHEDULER_JOB_CLEAN_OLD_FILES_HUMMINGBIRD_TIME_MODE}" /wps_outputs/hummingbird' | ||
| dockerargs: >- | ||
| --rm --name scheduler-job-clean_old_files_hummingbird | ||
| --volume ${COMPOSE_DIR}/optional-components/scheduler-job-clean_old_files/clean-old-files.sh:/clean-old-files.sh:ro | ||
| --volume "${COMPOSE_PROJECT_NAME}_wps_outputs:/wps_outputs:rw" | ||
| image: '${SCHEDULER_JOB_CLEAN_OLD_FILES_IMAGE}' |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,4 @@ | ||
| services: | ||
| scheduler: | ||
| volumes: | ||
| - ./optional-components/scheduler-job-clean_old_files/${__SCHEDULER_JOB_CLEAN_OLD_FILES_HUMMINGBIRD_CONFIG_LOC:-blank.config.yml}:/scheduler-job-configs/clean_old_files_hummingbird.yml:ro} |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,9 @@ | ||
| - name: clean_old_files_raven | ||
| comment: clean old WPS output files generated by Raven | ||
| schedule: '${SCHEDULER_JOB_CLEAN_OLD_FILES_RAVEN_FREQUENCY}' | ||
| command: 'sh /clean-old-files.sh "${SCHEDULER_JOB_CLEAN_OLD_FILES_RAVEN_OLDER_THAN_DAYS}" "${SCHEDULER_JOB_CLEAN_OLD_FILES_RAVEN_TIME_MODE}" /wps_outputs/raven' | ||
| dockerargs: >- | ||
| --rm --name scheduler-job-clean_old_files_raven | ||
| --volume ${COMPOSE_DIR}/optional-components/scheduler-job-clean_old_files/clean-old-files.sh:/clean-old-files.sh:ro | ||
| --volume "${COMPOSE_PROJECT_NAME}_wps_outputs:/wps_outputs:rw" | ||
| image: '${SCHEDULER_JOB_CLEAN_OLD_FILES_IMAGE}' |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,4 @@ | ||
| services: | ||
| scheduler: | ||
| volumes: | ||
| - ./optional-components/scheduler-job-clean_old_files/${__SCHEDULER_JOB_CLEAN_OLD_FILES_RAVEN_CONFIG_LOC:-blank.config.yml}:/scheduler-job-configs/clean_old_files_raven.yml:ro} |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,9 @@ | ||
| - name: clean_old_files_thredds | ||
| comment: clean old log files generated by Thredds | ||
| schedule: '${SCHEDULER_JOB_CLEAN_OLD_FILES_THREDDS_FREQUENCY}' | ||
| command: 'sh /clean-old-files.sh "${SCHEDULER_JOB_CLEAN_OLD_FILES_THREDDS_OLDER_THAN_DAYS}" "${SCHEDULER_JOB_CLEAN_OLD_FILES_THREDDS_TIME_MODE}" /thredds' | ||
| dockerargs: >- | ||
| --rm --name scheduler-job-clean_old_files_thredds | ||
| --volume ${COMPOSE_DIR}/optional-components/scheduler-job-clean_old_files/clean-old-files.sh:/clean-old-files.sh:ro | ||
| --volume "thredds_persistence:/thredds:rw" | ||
| image: '${SCHEDULER_JOB_CLEAN_OLD_FILES_IMAGE}' |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,4 @@ | ||
| services: | ||
| scheduler: | ||
| volumes: | ||
| - ./optional-components/scheduler-job-clean_old_files/${__SCHEDULER_JOB_CLEAN_OLD_FILES_THREDDS_CONFIG_LOC:-blank.config.yml}:/scheduler-job-configs/clean_old_files_thredds.yml:ro} |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,57 @@ | ||
| export SCHEDULER_JOB_CLEAN_OLD_FILES_DOCKER=alpine # alpine contains find with -ctime -mtime and -atime options (busybox based containers do not) | ||
| export SCHEDULER_JOB_CLEAN_OLD_FILES_VERSION=3.21 | ||
| export SCHEDULER_JOB_CLEAN_OLD_FILES_IMAGE='${SCHEDULER_JOB_CLEAN_OLD_FILES_DOCKER}:${SCHEDULER_JOB_CLEAN_OLD_FILES_VERSION}' | ||
|
|
||
| export SCHEDULER_JOB_CLEAN_OLD_FILES_FINCH_OLDER_THAN_DAYS= # unset by default if this job is enabled this must be set to an integer | ||
| export SCHEDULER_JOB_CLEAN_OLD_FILES_FINCH_TIME_MODE=atime | ||
| export SCHEDULER_JOB_CLEAN_OLD_FILES_FINCH_FREQUENCY="5 4 * * 0" # weekly on Sunday at 4:05 | ||
| export SCHEDULER_JOB_CLEAN_OLD_FILES_FINCH_ENABLED=false | ||
| export __SCHEDULER_JOB_CLEAN_OLD_FILES_FINCH_CONFIG_LOC='$( [ "${SCHEDULER_JOB_CLEAN_OLD_FILES_FINCH_ENABLED}" = "true" ] && echo "config/finch/config.yml" )' | ||
|
|
||
| export SCHEDULER_JOB_CLEAN_OLD_FILES_HUMMINGBIRD_OLDER_THAN_DAYS= # unset by default if this job is enabled this must be set to an integer | ||
| export SCHEDULER_JOB_CLEAN_OLD_FILES_HUMMINGBIRD_TIME_MODE=atime | ||
| export SCHEDULER_JOB_CLEAN_OLD_FILES_HUMMINGBIRD_FREQUENCY="10 4 * * 0" # weekly on Sunday at 4:10 | ||
| export SCHEDULER_JOB_CLEAN_OLD_FILES_HUMMINGBIRD_ENABLED=false | ||
| export __SCHEDULER_JOB_CLEAN_OLD_FILES_HUMMINGBIRD_CONFIG_LOC='$( [ "${SCHEDULER_JOB_CLEAN_OLD_FILES_HUMMINGBIRD_ENABLED}" = "true" ] && echo "config/hummingbird/config.yml" )' | ||
|
|
||
| export SCHEDULER_JOB_CLEAN_OLD_FILES_RAVEN_OLDER_THAN_DAYS= # unset by default if this job is enabled this must be set to an integer | ||
| export SCHEDULER_JOB_CLEAN_OLD_FILES_RAVEN_TIME_MODE=atime | ||
| export SCHEDULER_JOB_CLEAN_OLD_FILES_RAVEN_FREQUENCY="15 4 * * 0" # weekly on Sunday at 4:15 | ||
| export SCHEDULER_JOB_CLEAN_OLD_FILES_RAVEN_ENABLED=false | ||
| export __SCHEDULER_JOB_CLEAN_OLD_FILES_RAVEN_CONFIG_LOC='$( [ "${SCHEDULER_JOB_CLEAN_OLD_FILES_RAVEN_ENABLED}" = "true" ] && echo "config/raven/config.yml" )' | ||
|
|
||
| export SCHEDULER_JOB_CLEAN_OLD_FILES_THREDDS_OLDER_THAN_DAYS= # unset by default if this job is enabled this must be set to an integer | ||
| export SCHEDULER_JOB_CLEAN_OLD_FILES_THREDDS_TIME_MODE=mtime | ||
| export SCHEDULER_JOB_CLEAN_OLD_FILES_THREDDS_FREQUENCY="20 4 * * 0" # weekly on Sunday at 4:20 | ||
| export SCHEDULER_JOB_CLEAN_OLD_FILES_THREDDS_ENABLED=false | ||
| export __SCHEDULER_JOB_CLEAN_OLD_FILES_THREDDS_CONFIG_LOC='$( [ "${SCHEDULER_JOB_CLEAN_OLD_FILES_THREDDS_ENABLED}" = "true" ] && echo "config/thredds/config.yml" )' | ||
|
|
||
| export DELAYED_EVAL=" | ||
| $DELAYED_EVAL | ||
| SCHEDULER_JOB_CLEAN_OLD_FILES_IMAGE | ||
| __SCHEDULER_JOB_CLEAN_OLD_FILES_FINCH_CONFIG_LOC | ||
| __SCHEDULER_JOB_CLEAN_OLD_FILES_HUMMINGBIRD_CONFIG_LOC | ||
| __SCHEDULER_JOB_CLEAN_OLD_FILES_RAVEN_CONFIG_LOC | ||
| __SCHEDULER_JOB_CLEAN_OLD_FILES_THREDDS_CONFIG_LOC | ||
| " | ||
|
|
||
| OPTIONAL_VARS=" | ||
| $OPTIONAL_VARS | ||
| \$SCHEDULER_JOB_CLEAN_OLD_FILES_FINCH_OLDER_THAN_DAYS | ||
| \$SCHEDULER_JOB_CLEAN_OLD_FILES_FINCH_TIME_MODE | ||
| \$SCHEDULER_JOB_CLEAN_OLD_FILES_FINCH_FREQUENCY | ||
| \$SCHEDULER_JOB_CLEAN_OLD_FILES_HUMMINGBIRD_OLDER_THAN_DAYS | ||
| \$SCHEDULER_JOB_CLEAN_OLD_FILES_HUMMINGBIRD_TIME_MODE | ||
| \$SCHEDULER_JOB_CLEAN_OLD_FILES_HUMMINGBIRD_FREQUENCY | ||
| \$SCHEDULER_JOB_CLEAN_OLD_FILES_RAVEN_OLDER_THAN_DAYS | ||
| \$SCHEDULER_JOB_CLEAN_OLD_FILES_RAVEN_TIME_MODE | ||
| \$SCHEDULER_JOB_CLEAN_OLD_FILES_RAVEN_FREQUENCY | ||
| \$SCHEDULER_JOB_CLEAN_OLD_FILES_THREDDS_OLDER_THAN_DAYS | ||
| \$SCHEDULER_JOB_CLEAN_OLD_FILES_THREDDS_TIME_MODE | ||
| \$SCHEDULER_JOB_CLEAN_OLD_FILES_THREDDS_FREQUENCY | ||
| " | ||
|
|
||
| VARS=" | ||
| $VARS | ||
| \$SCHEDULER_JOB_CLEAN_OLD_FILES_IMAGE | ||
| " |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,25 @@ | ||
| if ! echo "${BIRDHOUSE_EXTRA_CONF_DIRS}" | grep -q 'scheduler[[:space:]]*$'; then | ||
| log WARN 'The scheduler-job-clean_old_files component is enabled but the scheduler component is not. This WILL cause problems. Please disable the scheduler-job-clean_old_files component.' | ||
| fi | ||
|
|
||
| _acceptable_modes='|mtime|ctime|atime|' | ||
|
|
||
| if [ "${SCHEDULER_JOB_CLEAN_OLD_FILES_FINCH_ENABLED}" = "true" ] && ! echo "${BIRDHOUSE_EXTRA_CONF_DIRS}" | grep -q 'finch[[:space:]]*$'; then | ||
| echo "$SCHEDULER_JOB_CLEAN_OLD_FILES_FINCH_OLDER_THAN_DAYS" | grep -q '^[0-9][0-9]*$' || log WARN "SCHEDULER_JOB_CLEAN_OLD_FILES_FINCH_OLDER_THAN_DAYS variable must be an integer not '${SCHEDULER_JOB_CLEAN_OLD_FILES_FINCH_OLDER_THAN_DAYS}'. Please set this variable to an integer or disable the finch file cleaning job. This job will not run properly!" | ||
| [ "${_acceptable_modes#*"${SCHEDULER_JOB_CLEAN_OLD_FILES_FINCH_TIME_MODE}"}" = "${_acceptable_modes}" ] && log WARN "SCHEDULER_JOB_CLEAN_OLD_FILES_FINCH_TIME_MODE variable must be one of 'mtime', 'atime', or 'ctime' not '${SCHEDULER_JOB_CLEAN_OLD_FILES_FINCH_TIME_MODE}'. Please set this variable to a valid option or disable the finch file cleaning job. This job will not run properly!" | ||
| fi | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can this be refactored to a function like |
||
|
|
||
| if [ "${SCHEDULER_JOB_CLEAN_OLD_FILES_HUMMINGBIRD_ENABLED}" = "true" ] && ! echo "${BIRDHOUSE_EXTRA_CONF_DIRS}" | grep -q 'hummingbird[[:space:]]*$'; then | ||
| echo "$SCHEDULER_JOB_CLEAN_OLD_FILES_HUMMINGBIRD_OLDER_THAN_DAYS" | grep -q '^[0-9][0-9]*$' || log WARN "SCHEDULER_JOB_CLEAN_OLD_FILES_HUMMINGBIRD_OLDER_THAN_DAYS variable must be an integer not '${SCHEDULER_JOB_CLEAN_OLD_FILES_HUMMINGBIRD_OLDER_THAN_DAYS}'. Please set this variable to an integer or disable the hummingbird file cleaning job. This job will not run properly!" | ||
| [ "${_acceptable_modes#*"${SCHEDULER_JOB_CLEAN_OLD_FILES_HUMMINGBIRD_TIME_MODE}"}" = "${_acceptable_modes}" ] && log WARN "SCHEDULER_JOB_CLEAN_OLD_FILES_HUMMINGBIRD_TIME_MODE variable must be one of 'mtime', 'atime', or 'ctime' not '${SCHEDULER_JOB_CLEAN_OLD_FILES_HUMMINGBIRD_TIME_MODE}'. Please set this variable to a valid option or disable the hummingbird file cleaning job. This job will not run properly!" | ||
| fi | ||
|
|
||
| if [ "${SCHEDULER_JOB_CLEAN_OLD_FILES_RAVEN_ENABLED}" = "true" ] && ! echo "${BIRDHOUSE_EXTRA_CONF_DIRS}" | grep -q 'raven[[:space:]]*$'; then | ||
| echo "$SCHEDULER_JOB_CLEAN_OLD_FILES_RAVEN_OLDER_THAN_DAYS" | grep -q '^[0-9][0-9]*$' || log WARN "SCHEDULER_JOB_CLEAN_OLD_FILES_RAVEN_OLDER_THAN_DAYS variable must be an integer not '${SCHEDULER_JOB_CLEAN_OLD_FILES_RAVEN_OLDER_THAN_DAYS}'. Please set this variable to an integer or disable the raven file cleaning job. This job will not run properly!" | ||
| [ "${_acceptable_modes#*"${SCHEDULER_JOB_CLEAN_OLD_FILES_RAVEN_TIME_MODE}"}" = "${_acceptable_modes}" ] && log WARN "SCHEDULER_JOB_CLEAN_OLD_FILES_RAVEN_TIME_MODE variable must be one of 'mtime', 'atime', or 'ctime' not '${SCHEDULER_JOB_CLEAN_OLD_FILES_RAVEN_TIME_MODE}'. Please set this variable to a valid option or disable the raven file cleaning job. This job will not run properly!" | ||
| fi | ||
|
|
||
| if [ "${SCHEDULER_JOB_CLEAN_OLD_FILES_THREDDS_ENABLED}" = "true" ] && ! echo "${BIRDHOUSE_EXTRA_CONF_DIRS}" | grep -q 'thredds[[:space:]]*$'; then | ||
| echo "$SCHEDULER_JOB_CLEAN_OLD_FILES_THREDDS_OLDER_THAN_DAYS" | grep -q '^[0-9][0-9]*$' || log WARN "SCHEDULER_JOB_CLEAN_OLD_FILES_THREDDS_OLDER_THAN_DAYS variable must be an integer not '${SCHEDULER_JOB_CLEAN_OLD_FILES_THREDDS_OLDER_THAN_DAYS}'. Please set this variable to an integer or disable the thredds file cleaning job. This job will not run properly!" | ||
| [ "${_acceptable_modes#*"${SCHEDULER_JOB_CLEAN_OLD_FILES_THREDDS_TIME_MODE}"}" = "${_acceptable_modes}" ] && log WARN "SCHEDULER_JOB_CLEAN_OLD_FILES_THREDDS_TIME_MODE variable must be one of 'mtime', 'atime', or 'ctime' not '${SCHEDULER_JOB_CLEAN_OLD_FILES_THREDDS_TIME_MODE}'. Please set this variable to a valid option or disable the thredds file cleaning job. This job will not run properly!" | ||
| fi | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be in the optional-components' README.
Should also include mentions of the specific per-service configuration variables, just to provide insights that there are options to enable/disable them individually, and further options to control their cron/cleanup/fs-meta strategies.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All of the optional scheduler jobs are documented here and in env.local.example (for configuration options).
If we want to improve the documentation for all these jobs that's fine but we should do so for all, not just one, and maybe in a different PR.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since it is an optional component, it should be described in
birdhouse/optional-components/README.rst, notbirdhouse/components/README.rst.The documentation does not need to be explicit about all combinations of component/variables, but should at least indicate some guidance about configuring available options. Notably cron config with atime/mtime conditions on a per-component basis is not that obvious for users that didn't participate in this PR's discussions. Reading only the
default.envdoes not provide all details.For example:
birdhouse-deploy/birdhouse/optional-components/scheduler-job-clean_old_files/default.env
Lines 5 to 9 in af633e4
(from the perspective of a new user):
..._OLDER_THAN_DAYSimpact/interact with..._FREQUENCYif set to a value?..._OLDER_THAN_DAYSbe another lower value (e.g.: every hour)?_TIME_MODE? (atime,mtime,ctime, others?)Is there any that would break the functionality if employed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All these questions are covered in the documentation in env.local.example and the README file
birdhouse-deploy/birdhouse/env.local.example
Lines 209 to 218 in af633e4
birdhouse-deploy/birdhouse/components/README.rst
Lines 66 to 67 in af633e4
I can either come back to this PR at a later date when I have time to write more extensive documentation or we can include this now and I'll write more documentation in general for the scheduler jobs as a separate PR. I don't really have a preference so if you'd like to wait to merge this PR until more documentation is in place I'll leave it for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A copy-paste is sufficient with quotes for the code elements.
Can be updated later with more details if that becomes necessary.
I prefer actual docs because it is easier to navigate and read by users. If I point a user to this env file, they will nope-out immediately.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes this clearly solve the duplicate docs problem but might render the tracking of what config var changed (added, renamed, delete) harder for the simple user. Example with JupyterHub, I had to
jupyterhub --generate-configto have this one single file so I can diff against my current config. Given I am not familiar with their code, I can not see the difference easily simply when browsing the code.If we take the simple road of simply referencing the config var doc to the proper
default.envfile, we also fix the duplicate doc problem and we provide a one single file for new users to look up for the possible list of "public" config vars.This one single file can also be diffed to find out when a new config var has been added, renamed or deleted.
But with only the reference to the
default.env, user will have to make an additional lookup to the matchingdefault.envto read the docs, if the var name is unclear.Both (one single file for easy browsing vs generated docs) have pros and cons. I do not have a strong opinion on either of those approaches.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking more about this duplicate doc problem, how about
env.local.examplewe only put example usage but no documentation wording, referring to the matchingdefault.envHonesly, when reading docs I can infer a lot but just looking at the example usage. I will actually read the full documentation only when my inference from the example do not work as intended.
And I think when updating docs, we tend to update the wording to clarify explanation much more frequently than changing the example. Because when changing the example, we basically change the format of the expected value, that means code change and variable meaning change, which happen much less frequently.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we really have super diligent and lengthy documentation in most cases in env files. Most places where there are more details is typically because of specific gotchas that are important to highlight contextually in that place, or that support examples, both of which are more relevant with the variable (or others around involved with it).
If we add more verbose documentation or explanations to further guide users, then I think those should be only in the README. The few example/comments should be self-explanatory as much as possible in env files. There can also be "see https://..." comments pointing at README to facilitate navigation/information retrieval.
About
components/optional-componentslocation, my main concern is that if I specifically add a givenoptional-componentsfeature, I would look up in the corresponding README to find its details. Not having a direct reference just makes it harder to find things. The fact that this optional feature uses thecomponentsscheduler does not matter. The scheduler should only list cross-refs to all compatible job components (wherever they reside), and their specific entries should provide more details as needed. This is already the case for many otheroptional-componentsthat point at theircomponentrequirement/compatibility references this way. Scheduler jobs are the ones doing things differently that all others.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok...
There are a lot of improvements that can be made to the documentation. I think we all agree on that.
However I think that the question of how to organize the README files is separate from how to organize default.env and env.local.example.
For the README files, yes we should keep the info from components and optional components in their respective READMEs. And I am happy to start that by putting the documentation relevant to this PR in the optional components.
A separate PR(s) will have to be made to update the documentation for the other components. For example, the auto deploy info is still in components even though it is nowadays scheduler job in optional components.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. Only focus on the current components edited/added within this PR.