exp-workers failing without logs #10673
Labels
A: experiments
Related to dvc exp
bug
Did we break something?
help wanted
triage
Needs to be triaged
Bug Report
DVC EXP workers dying
Running multiple workers results in failed experiments and no logs
Description
Launching
dvc queue start
with parameter-j
greater than 1 fails some experiments that shouldn't fail and these experiments will have no logs. Furthermore, sometimes the exp-worker dies with the failed experiments.Reproduce
Example:
params.yaml
dvc.yaml
git init
dvc init
git add *.yaml
git commit -m "initial commit"
dvc queue start -j 20
dvc queue status | grep Failed
dvc queue logs ...
Note: that it doesn't always fail, so maybe you have to iterate starting at step 7.
Output sample
Environment information
Output of
dvc doctor
:The text was updated successfully, but these errors were encountered: