Based on job's failureState, either retry process from beginning or fetch results from queue and retry saving results. If the job fails twice, perhaps there could be a SystemIssue table or similar to help with keeping track of failed jobs so that we can then debug the issue if needed.
From SyncLinear.com | DO-281