-
Notifications
You must be signed in to change notification settings - Fork 2
Open
0 / 10 of 1 issue completedOpen
0 / 10 of 1 issue completed
Copy link
Labels
Description
Symptom
In most of containerized pipeline run attempts, when a branched target is halted or suspended by a user, all branched targets were failed with the identical error message of "unknown or uninitialised column: worker
. This error message states that the specified worker (resource) is not defined in a controller group, which is not true since the _targets.R
includes the correct worker names being used in the pipeline in the crew
controller group.
Findings
- The failure is observed only in branched targets, regardless of setting the specific resource that is expected to be used. All non-branched targets with a single object were completed as expected.
- The error message is reproduced both in SLURM job submissions and interactive runs in a local workstation; we could focus on the
targets
orcrew
level solutions. - The controller group was suddenly losing all its workers after the suspension. Rerunning the pipeline (partially or entirely) with a controller name retrieval target stores
NULL
value
# In a target:
cntrlrs <- targets::tar_option_get("controller")
mycntrlrs <- names(cntrlrs$private$.controllers)
mycntrlrs
# NULL
What do we need to do?
First of all, I will check if all targets
settings are lost after the first suspension or failure in branched targets. Afterwards, we might consider reporting this issue to crew
and targets
developers after crossing out potential causes of this error.