Skip to content

Commit

Permalink
Fixed race condition which happens when a job runs "too fast",
Browse files Browse the repository at this point in the history
When a podman process finishes even before reaching the monitoring
method, a deadlock happens, as no one is updating `process.returncode`
and spawned process is in zombie state (so, no signal is sent).

This fix adds a `process.poll()` call, so it gives the chance to fill in
`process.returncode`.
  • Loading branch information
jmfernandez committed Aug 21, 2023
1 parent 5a645df commit 71333c9
Showing 1 changed file with 4 additions and 0 deletions.
4 changes: 4 additions & 0 deletions cwltool/job.py
Original file line number Diff line number Diff line change
Expand Up @@ -857,6 +857,10 @@ def docker_monitor(
cid: Optional[str] = None
while cid is None:
time.sleep(1)
# This is needed to avoid a race condition where the job
# was so fast that it already finished when it arrives here
if process.returncode is None:
process.poll()
if process.returncode is not None:
if cleanup_cidfile:
try:
Expand Down

0 comments on commit 71333c9

Please sign in to comment.