Skip to content
Open
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
25 changes: 24 additions & 1 deletion CHANGES.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,30 @@
[Unreleased](https://github.com/bird-house/birdhouse-deploy/tree/master) (latest)
------------------------------------------------------------------------------------------------------------------

[//]: # (list changes here, using '-' for each new entry, remove this when items are added)
## Changes

- Create template component for data deploy jobs

New data deploy scheduler jobs no longer need to copy/paste lots of boilerplate code to create a new job.
Instead they can simply define specific environment variables and then the can now use the
`optional-components/scheduler-job-deploy_data` job will automatically generate a new data deploy job.

For example, if `XXXX` is added to the `SCHEDULER_JOB_DEPLOY_DATA_JOB_IDS` variable and the following
variables are defined:

- `SCHEDULER_JOB_XXXX_DEPLOY_DATA_JOB_NAME`
- `SCHEDULER_JOB_XXXX_DEPLOY_DATA_JOB_COMMENT`
- `SCHEDULER_JOB_XXXX_DEPLOY_DATA_JOB_CHECKOUT_CACHE`
- `SCHEDULER_JOB_XXXX_DEPLOY_DATA_JOB_LOG_FILENAME`
- `SCHEDULER_JOB_XXXX_DEPLOY_DATA_JOB_SCHEDULE`
- `SCHEDULER_JOB_XXXX_DEPLOY_DATA_JOB_CONFIG_FILE`
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice interface !


a deploy data job will be automatically created.

See `optional-components/scheduler-job-deploy_raven_testdata/default.env` and
`optional-components/scheduler-job-deploy_raven_testdata/default.env` for examples.

See `birdhouse/deployment/deploy-data` for details on how the deploy data job works.

[2.15.0](https://github.com/bird-house/birdhouse-deploy/tree/2.15.0) (2025-05-27)
------------------------------------------------------------------------------------------------------------------
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
services:
scheduler:
volumes:
- ./optional-components/scheduler-job-deploy_data/config.yml:/scheduler-job-configs/deploy_data.yml:ro
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
export SCHEDULER_JOB_DEPLOY_DATA_JOB_DOCKER='docker'
export SCHEDULER_JOB_DEPLOY_DATA_JOB_VERSION='19.03.6-git'
export SCHEDULER_JOB_DEPLOY_DATA_JOB_IMAGE='${SCHEDULER_JOB_DEPLOY_DATA_JOB_DOCKER}:${SCHEDULER_JOB_DEPLOY_DATA_JOB_VERSION}'

export SCHEDULER_JOB_DEPLOY_EXTRA_DOCKER_ARGS='$([ -n "$DEPLOY_DATA_JOB_GIT_SSH_IDENTITY_FILE" ] && echo "--volume ${DEPLOY_DATA_JOB_GIT_SSH_IDENTITY_FILE}:${DEPLOY_DATA_JOB_GIT_SSH_IDENTITY_FILE}:ro --env DEPLOY_DATA_GIT_SSH_IDENTITY_FILE=${DEPLOY_DATA_JOB_GIT_SSH_IDENTITY_FILE} ")'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This JOB_GIT_SSH_IDENTITY_FILE can be set per job as well, because different private repos can potentially have different keys, see

# Location of ssh private key for git clone over ssh, useful for private repos.
#DEPLOY_DATA_JOB_GIT_SSH_IDENTITY_FILE="/path/to/id_rsa"
#DEPLOY_DATA_JOB_GIT_SSH_IDENTITY_FILE=/home/vagrant/.ssh/id_rsa_git_ssh_read_only

This var is reset at the end, allowing for different key per job, see

# Reset all config vars to prevent cross-contamination between successive invocations.
DEPLOY_DATA_JOB_SCHEDULE=""
DEPLOY_DATA_JOB_JOB_NAME=""
DEPLOY_DATA_JOB_CONFIG=""
DEPLOY_DATA_JOB_CHECKOUT_CACHE=""
DEPLOY_DATA_JOB_LOGFILE=""
DEPLOY_DATA_JOB_JOB_DESCRIPTION=""
DEPLOY_DATA_JOB_DOCKER_IMAGE=""
DEPLOY_DATA_JOB_GIT_SSH_IDENTITY_FILE=""
DEPLOY_DATA_JOB_DOCKER_RUN_EXTRA_OPTS=""

Now looking back at this list, DEPLOY_DATA_JOB_DOCKER_IMAGE was also possible to have a different image per job, with a default if unset !

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With your new commit ec9e810, I think this SCHEDULER_JOB_DEPLOY_EXTRA_DOCKER_ARGS is unused and can be deleted.


DELAYED_EVAL="
$DELAYED_EVAL
SCHEDULER_JOB_DEPLOY_EXTRA_DOCKER_ARGS
SCHEDULER_JOB_DEPLOY_DATA_JOB_IMAGE
"
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
generate_scheduler_job_deploy_data_template() {
job_id="$1"
name="$(eval "echo \"\$SCHEDULER_JOB_${job_id}_DEPLOY_DATA_JOB_NAME\"")"
comment="$(eval "echo \"\$SCHEDULER_JOB_${job_id}_DEPLOY_DATA_JOB_COMMENT\"")"
schedule="$(eval "echo \"\$SCHEDULER_JOB_${job_id}_DEPLOY_DATA_JOB_SCHEDULE\"")"
config_file="$(eval "echo \"\$SCHEDULER_JOB_${job_id}_DEPLOY_DATA_JOB_CONFIG_FILE\"")"
checkout_cache="$(eval "echo \"\$SCHEDULER_JOB_${job_id}_DEPLOY_DATA_JOB_CHECKOUT_CACHE\"")"
log_file_name="$(eval "echo \"\$SCHEDULER_JOB_${job_id}_DEPLOY_DATA_JOB_LOG_FILENAME\"")"
extra_args="$(eval "echo \"\$SCHEDULER_JOB_${job_id}_DEPLOY_DATA_JOB_EXTRA_ARGS\"")"

error_msg = "No XXX found for deploy data job with id '$job_id'"
[ -z "$name" ] && log ERROR "$(sed $error_msg | 's/XXX/name/')" && return 1
[ -z "$schedule" ] && log ERROR "$(sed $error_msg | 's/XXX/schedule/')" && return 1
[ -z "$config_file" ] && log ERROR "$(sed $error_msg | 's/XXX/config_file/')" && return 1
[ -z "$checkout_cache" ] && log ERROR "$(sed $error_msg | 's/XXX/checkout_cache/')" && return 1

echo "
- name: '${name}'
comment: '${comment}'
schedule: '${schedule}'
command: '/deploy-data ${config_file}'
dockerargs: >-
--rm --name '${name}'
--volume /var/run/docker.sock:/var/run/docker.sock:ro
--volume ${COMPOSE_DIR}/deployment/deploy-data:/deploy-data:ro
--volume ${config_file}:${config_file}:ro
--volume ${checkout_cache}:${checkout_cache}:rw
--volume ${BIRDHOUSE_LOG_DIR}:${BIRDHOUSE_LOG_DIR}:rw
--env DEPLOY_DATA_CHECKOUT_CACHE=${checkout_cache}
--env DEPLOY_DATA_LOGFILE=${log_file_name:-"deploy-data-${name}.log"} ${SCHEDULER_JOB_DEPLOY_EXTRA_DOCKER_ARGS} ${extra_args}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing support for DEPLOY_DATA_JOB_DOCKER_RUN_EXTRA_OPTS, see https://github.com/bird-house/birdhouse-deploy/blob/b5cd7f6501793ea8e67f76d2208e02dc797ba031/birdhouse/components/scheduler/deploy_data_job.env#L99C81-L99C118

and

# Docker run extra opts.
# 4 spaces in front of --env very important to respect.
#DEPLOY_DATA_JOB_DOCKER_RUN_EXTRA_OPTS="
# --env ENV1=val1
# --env ENV2=val2"

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This functionality is supported with the SCHEDULER_JOB_XXXX_DEPLOY_DATA_JOB_EXTRA_ARGS variable but I'll change the name to SCHEDULER_JOB_XXXX_DEPLOY_DATA_JOB_EXTRA_OPTIONS since you're right that they're options, not arguments.

image: '${SCHEDULER_JOB_DEPLOY_DATA_JOB_IMAGE}'
"
}

SCHEDULER_JOB_DEPLOY_DATA_JOB_CONFIG_FILE="${COMPOSE_DIR}/optional-components/scheduler-job-deploy_data/config.yml"

echo > "$SCHEDULER_JOB_DEPLOY_DATA_JOB_CONFIG_FILE"

for job_id in $SCHEDULER_JOB_DEPLOY_DATA_JOB_IDS; do
generate_scheduler_job_deploy_data_template "$job_id" >> "$SCHEDULER_JOB_DEPLOY_DATA_JOB_CONFIG_FILE"
done

This file was deleted.

This file was deleted.

Original file line number Diff line number Diff line change
@@ -1,29 +1,20 @@
export SCHEDULER_JOB_RAVEN_DEPLOY_DATA_JOB_NAME=deploy_raven_testdata_to_thredds
export SCHEDULER_JOB_RAVEN_DEPLOY_DATA_JOB_COMMENT="Auto-deploy Raven testdata to Thredds for Raven tutorial notebooks."
export SCHEDULER_JOB_RAVEN_DEPLOY_DATA_JOB_CHECKOUT_CACHE='${BIRDHOUSE_DATA_PERSIST_ROOT}/deploy_data_cache/deploy_raven_testdata_to_thredds'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

JOB_CHECKOUT_CACHE default generated from JOB_NAME if unset, see

# Location for local cache of git clone to save bandwidth and time from always
# re-cloning from scratch.
if [ -z "$DEPLOY_DATA_JOB_CHECKOUT_CACHE" ]; then
DEPLOY_DATA_JOB_CHECKOUT_CACHE="${BIRDHOUSE_DATA_PERSIST_ROOT:-/data}/deploy_data_cache/${DEPLOY_DATA_JOB_JOB_NAME}"
fi

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With your new commit ec9e810, I think this job specific SCHEDULER_JOB_RAVEN_DEPLOY_DATA_JOB_CHECKOUT_CACHE and the one below SCHEDULER_JOB_RAVEN_DEPLOY_DATA_JOB_LOG_FILENAME can be deleted since they will derive from the default.

export SCHEDULER_JOB_RAVEN_DEPLOY_DATA_JOB_LOG_FILENAME='deploy_raven_testdata_to_thredds.log'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

JOB_LOG_FILENAME also have default generated from JOB_NAME, see

# Log file location. Default location under /var/log/birdhouse/ has built-in logrotate.
if [ -z "$DEPLOY_DATA_JOB_LOGFILE" ]; then
DEPLOY_DATA_JOB_LOGFILE="${BIRDHOUSE_LOG_DIR}/${DEPLOY_DATA_JOB_JOB_NAME}.log"
fi

The previous deploy_raven_testdata_to_thredds.env, only 4 vars need to be set, the rest have generated defaults. This default.env should be similar. See

# Source this file in env.local before sourcing deploy_data_job.env.
# This will configure deploy_data_job.env.
DEPLOY_DATA_JOB_SCHEDULE="*/30 * * * *" # UTC
DEPLOY_DATA_JOB_JOB_NAME="deploy_raven_testdata_to_thredds"
DEPLOY_DATA_JOB_CONFIG="${COMPOSE_DIR}/deployment/deploy-data-raven-testdata-to-thredds.yml"
DEPLOY_DATA_JOB_JOB_DESCRIPTION="Auto-deploy Raven testdata to Thredds for Raven tutorial notebooks."

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup this is the same: only name, schedule and config file need to be specified.

export SCHEDULER_JOB_RAVEN_DEPLOY_DATA_JOB_SCHEDULE="*/30 * * * *" # UTC
export SCHEDULER_JOB_RAVEN_DEPLOY_DATA_JOB_SCHEDULE='*/30 * * * *' # UTC
export SCHEDULER_JOB_RAVEN_DEPLOY_DATA_JOB_CONFIG_FILE="${COMPOSE_DIR}/deployment/deploy-data-raven-testdata-to-thredds.yml"

export SCHEDULER_JOB_RAVEN_DEPLOY_DATA_JOB_DOCKER='docker'
export SCHEDULER_JOB_RAVEN_DEPLOY_DATA_JOB_VERSION='19.03.6-git'
export SCHEDULER_JOB_RAVEN_DEPLOY_DATA_JOB_IMAGE='${SCHEDULER_JOB_RAVEN_DEPLOY_DATA_JOB_DOCKER}:${SCHEDULER_JOB_RAVEN_DEPLOY_DATA_JOB_VERSION}'

export SCHEDULER_JOB_RAVEN_DEPLOY_EXTRA_DOCKER_ARGS='$([ -n "$DEPLOY_DATA_JOB_GIT_SSH_IDENTITY_FILE" ] && echo "--volume ${DEPLOY_DATA_JOB_GIT_SSH_IDENTITY_FILE}:${DEPLOY_DATA_JOB_GIT_SSH_IDENTITY_FILE}:ro --env DEPLOY_DATA_GIT_SSH_IDENTITY_FILE=${DEPLOY_DATA_JOB_GIT_SSH_IDENTITY_FILE} ")'
export SCHEDULER_JOB_DEPLOY_DATA_JOB_IDS="
$SCHEDULER_JOB_DEPLOY_DATA_JOB_IDS
RAVEN
"

DELAYED_EVAL="
$DELAYED_EVAL
SCHEDULER_JOB_RAVEN_DEPLOY_DATA_JOB_CHECKOUT_CACHE
SCHEDULER_JOB_RAVEN_DEPLOY_EXTRA_DOCKER_ARGS
SCHEDULER_JOB_RAVEN_DEPLOY_DATA_JOB_IMAGE
"

VARS="
$VARS
\$SCHEDULER_JOB_RAVEN_DEPLOY_DATA_JOB_CHECKOUT_CACHE
\$SCHEDULER_JOB_RAVEN_DEPLOY_DATA_JOB_SCHEDULE$
\$SCHEDULER_JOB_RAVEN_DEPLOY_DATA_JOB_IMAGE
\$SCHEDULER_JOB_RAVEN_DEPLOY_DATA_JOB_LOG_FILENAME
"

OPTIONAL_VARS="
$OPTIONAL_VARS
\$SCHEDULER_JOB_RAVEN_DEPLOY_EXTRA_DOCKER_ARGS
COMPONENT_DEPENDENCIES="
./optional-components/scheduler-job-deploy_data
"
Original file line number Diff line number Diff line change
@@ -1,7 +1,3 @@
if ! echo "${BIRDHOUSE_EXTRA_CONF_DIRS}" | grep -q 'thredds[[:space:]]*$'; then
log WARN 'The scheduler-job-deploy_raven_testdata component is enabled but the thredds component is not. This WILL cause problems. Please disable the scheduler-job-deploy_raven_testdata component.'
fi

if ! echo "${BIRDHOUSE_EXTRA_CONF_DIRS}" | grep -q 'raven[[:space:]]*$'; then
log WARN 'The scheduler-job-deploy_raven_testdata component is enabled but the raven component is not. Are you sure you want to enable the scheduler-job-deploy_raven_testdata component?'
fi

This file was deleted.

This file was deleted.

This file was deleted.

Original file line number Diff line number Diff line change
@@ -1,29 +1,20 @@
export SCHEDULER_JOB_XCLIM_DEPLOY_DATA_JOB_NAME=deploy_xclim_testdata_to_thredds
export SCHEDULER_JOB_XCLIM_DEPLOY_DATA_JOB_COMMENT='Auto-deploy Xclim testdata to Thredds for Finch and Xclim tutorial notebooks.'
export SCHEDULER_JOB_XCLIM_DEPLOY_DATA_JOB_CHECKOUT_CACHE='${BIRDHOUSE_DATA_PERSIST_ROOT}/deploy_data_cache/deploy_xclim_testdata_to_thredds'
export SCHEDULER_JOB_XCLIM_DEPLOY_DATA_JOB_LOG_FILENAME='deploy_xclim_testdata_to_thredds.log'
export SCHEDULER_JOB_XCLIM_DEPLOY_DATA_JOB_SCHEDULE="7,37 * * * *" # UTC
export SCHEDULER_JOB_XCLIM_DEPLOY_DATA_JOB_SCHEDULE='7,37 * * * *' # UTC
export SCHEDULER_JOB_XCLIM_DEPLOY_DATA_JOB_CONFIG_FILE="${COMPOSE_DIR}/deployment/deploy-data-xclim-testdata-to-thredds.yml"

export SCHEDULER_JOB_XCLIM_DEPLOY_DATA_JOB_DOCKER='docker'
export SCHEDULER_JOB_XCLIM_DEPLOY_DATA_JOB_VERSION='19.03.6-git'
export SCHEDULER_JOB_XCLIM_DEPLOY_DATA_JOB_IMAGE='${SCHEDULER_JOB_RAVEN_DEPLOY_DATA_JOB_DOCKER}:${SCHEDULER_JOB_RAVEN_DEPLOY_DATA_JOB_VERSION}'

export SCHEDULER_JOB_XCLIM_DEPLOY_EXTRA_DOCKER_ARGS='$([ -n "$DEPLOY_DATA_JOB_GIT_SSH_IDENTITY_FILE" ] && echo "--volume ${DEPLOY_DATA_JOB_GIT_SSH_IDENTITY_FILE}:${DEPLOY_DATA_JOB_GIT_SSH_IDENTITY_FILE}:ro --env DEPLOY_DATA_GIT_SSH_IDENTITY_FILE=${DEPLOY_DATA_JOB_GIT_SSH_IDENTITY_FILE} ")'
export SCHEDULER_JOB_DEPLOY_DATA_JOB_IDS="
$SCHEDULER_JOB_DEPLOY_DATA_JOB_IDS
XCLIM
"

DELAYED_EVAL="
$DELAYED_EVAL
SCHEDULER_JOB_XCLIM_DEPLOY_DATA_JOB_CHECKOUT_CACHE
SCHEDULER_JOB_XCLIM_DEPLOY_EXTRA_DOCKER_ARGS
SCHEDULER_JOB_XCLIM_DEPLOY_DATA_JOB_IMAGE
"

VARS="
$VARS
\$SCHEDULER_JOB_XCLIM_DEPLOY_DATA_JOB_CHECKOUT_CACHE
\$SCHEDULER_JOB_XCLIM_DEPLOY_DATA_JOB_SCHEDULE
\$SCHEDULER_JOB_XCLIM_DEPLOY_DATA_JOB_LOG_FILENAME
\$SCHEDULER_JOB_XCLIM_DEPLOY_DATA_JOB_IMAGE
"

OPTIONAL_VARS="
$OPTIONAL_VARS
\$SCHEDULER_JOB_XCLIM_DEPLOY_EXTRA_DOCKER_ARGS
"
COMPONENT_DEPENDENCIES="
./optional-components/scheduler-job-deploy_data
"