-
Notifications
You must be signed in to change notification settings - Fork 8
Description
At the moment, rabbit jobs require two exceptions to expedite cleanup. The first exception will move the job to the CLEANUP
state, triggering the dws-epilog
epilog action and the usual rabbit end-of-job routine: unmount file systems, move data, destroy file systems. If an exception is raised during that epilog action though, any data movement is abandoned: the workflow is moved directly to Teardown
. However, the file systems must still be unmounted and destroyed before the epilog is removed.
After discussion with @behlendorf and @ofaaland , they found it somewhat counterintuitive that it would take two exceptions to move a workflow to Teardown
. They thought a cancel should be a full cancel and indicated that the job should be abandoned completely. However, one reason to maintain the current state of affairs is that a job expiration is a severity-zero exception, but it doesn't seem right that a job timeout should cause a job to abandon data movement, since some valuable data may already have been produced.
The dws-jobtap
plugin could always check the type of the exceptions that come through and have different handling for timeouts vs cancellations.
@behlendorf indicated he thought the current way of doing things is fine as long as it is documented.