Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Being able to notify the job/queue to avoid stalling when job is CPU intensive #2857

Open
maxime-dupuis opened this issue Oct 23, 2024 · 2 comments

Comments

@maxime-dupuis
Copy link

maxime-dupuis commented Oct 23, 2024

Is your feature request related to a problem? Please describe.

My CPU intensive jobs would do "correct" work, but they stall with errors:

  • Missing lock for job JOB_ID
  • Error: job stalled more than allowable limit
async checkStaleCreatives(): Promise<any> {
    // Keep busy
    const startTime = Date.now();
    while (Date.now() - startTime < 30000) {
        // Spin
    }
}

image

Describe the solution you'd like

I'd like a way to tell BullMQ that my job is still working correctly. So that my job is not considered stalled.


await job.notifyNotStalled()
or
await queue.notifyNotStalled()

AND/OR the snippet below (or a better fix) should be in the documentation about stalled jobs so people know how to prevent the stalls

Describe alternatives you've considered

Adding this in the CPU intensive loop solves my problem

// Yield to the event loop to avoid stalls
await new Promise((resolve) => setTimeout(resolve));

image

@roggervalf
Copy link
Collaborator

roggervalf commented Oct 26, 2024

hi @maxime-dupuis sorry for the delay. Looks like we need to improve our docs a little bit. We have these extra options available in Worker class https://api.docs.bullmq.io/interfaces/v5.WorkerOptions.html#lockDuration and https://api.docs.bullmq.io/interfaces/v5.WorkerOptions.html#lockRenewTime that you can increase. You can make some tries to check which values work for your use cases.

@maxime-dupuis
Copy link
Author

maxime-dupuis commented Oct 28, 2024

lockDuration

Thanks, I think it might be useful sometimes if the expected duration is known.

In our case, the "yielding" trick works best for us because

  • We can't predict the duration of the job (not obvious from my code example, sorry!)
  • We have natural points where it makes sense to "yield" (every loop)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants