refactor check_job_status

Originally check_job_status declared that a job that started 30 minutes ago and has not finished is 'incomplete' and kicked off an attempt to finish this incomplete job.  For long running jobs, this means a new task would start to run, trying to insert rows into the database that were already there and triggering an avalanche of IntegrityErrors that would last for hours and bog down the whole system.

As an expedient, we changed the timing to allow 4 hours for the job to finish before it is declared incomplete.   This is because we're currently only supporting sends of 25k rows and we know we can send those out in less than 2 hours.

But ideally, we should be using job status and not time ranges.  What would cause a job to just stop running?  Is it possible?  Is it for a random reboot in the middle of a task?  If it has to be time based, what should the time be.  Someone needs to take a look and rethink this.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

refactor check_job_status #1507

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

refactor check_job_status #1507

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions