Skip to content

dws-jobtap.so exception handling #342

@jameshcorbett

Description

@jameshcorbett

At the moment, rabbit jobs require two exceptions to expedite cleanup. The first exception will move the job to the CLEANUP state, triggering the dws-epilog epilog action and the usual rabbit end-of-job routine: unmount file systems, move data, destroy file systems. If an exception is raised during that epilog action though, any data movement is abandoned: the workflow is moved directly to Teardown. However, the file systems must still be unmounted and destroyed before the epilog is removed.

After discussion with @behlendorf and @ofaaland , they found it somewhat counterintuitive that it would take two exceptions to move a workflow to Teardown. They thought a cancel should be a full cancel and indicated that the job should be abandoned completely. However, one reason to maintain the current state of affairs is that a job expiration is a severity-zero exception, but it doesn't seem right that a job timeout should cause a job to abandon data movement, since some valuable data may already have been produced.

The dws-jobtap plugin could always check the type of the exceptions that come through and have different handling for timeouts vs cancellations.

@behlendorf indicated he thought the current way of doing things is fine as long as it is documented.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions