-
Notifications
You must be signed in to change notification settings - Fork 235
Description
Checklist
- Which Faktory package and version?
Faktory v1.9.0 - Which Faktory worker package and version?
faktory_workers_go v1.9.0 - Please include any relevant worker configuration
Workers = 3
Concurrency = 2
- Please include any relevant error messages or stacktraces
Client
--
Unable to report JID Qtjl6r_Ifxh32kUQ result to Faktory: read tcp 172.16.114.109:56062->10.100.183.42:7419: i/o timeout
Server
--
Unable to process timed job: cannot retry reservation: Job not found Qtjl6r_Ifxh32kUQ
No such job to acknowledge Qtjl6r_Ifxh32kUQ
Are you using an old version?
No
Have you checked the changelogs to see if your issue has been fixed in a later version?
Yes
Context
We're running a bulk process with Faktory which triggers millions of individual Jobs wrapped in Batches to split the work into manageable chunks.
Problem
Sometimes the Batches UI page shows 1 pending Job which is neither running nor waiting to be processed in the queue, leaving the Batch stuck and never completing. The success/complete callbacks on the Batch aren't being called neither.
When finding for logs, there is little to be seen. The most I've managed to find are networking error logs like the following:
-
Worker logs:
Unable to report JID Qtjl6r_Ifxh32kUQ result to Faktory: read tcp 172.16.114.109:56062->10.100.183.42:7419: i/o timeout -
Server logs:
Unable to process timed job: cannot retry reservation: Job not found Qtjl6r_Ifxh32kUQ
No such job to acknowledge Qtjl6r_Ifxh32kUQ