-
Notifications
You must be signed in to change notification settings - Fork 518
Open
Description
Version
v5.54.0
Platform
NodeJS
What happened?
Title: Jobs randomly get stuck in "active" state not even working after restart and fails.
Description:
Jobs in our BullMQ queues periodically get stuck in the "active" state without being processed or moved to failed state. This affects only specific queues while others continue working normally. There is no resolution I have found yet.
Environment:
- BullMQ Version: [v5.54.0]
- Node.js Version: [22]
- Redis Version: [redis-cli 7.0.15]
- Operating System: Ubuntu 24
- Deployment: VM
Reproduction Steps:
- Start a BullMQ worker processing jobs normally
- After several hours/days of operation, observe that:
- New jobs enter the queue
- Jobs are picked up and move to "active" state
- Jobs remain stuck in "active" state indefinitely
- No progress is made on stuck jobs
- No error events are emitted
Expected Behavior:
- Jobs should either complete processing and move to "completed" state
- Or jobs should fail and move to "failed" state with appropriate error
- Stalled jobs should be automatically detected and restarted based on configuration
Actual Behavior:
- Jobs remain stuck in "active" state indefinitely
- No automatic stall detection occurs despite
stalledIntervalconfiguration - No error events or logs are generated for stuck jobs
Configuration:
const queue = new Queue('affected-queue', {
connection: redisConfig,
settings: {
stalledInterval: 30000,
maxStalledCount: 2,
},
defaultJobOptions: {
attempts: 3,
backoff: {
type: 'exponential',
delay: 1000,
},
removeOnComplete: 100,
removeOnFail: 100,
},
});
### How to reproduce.
function initializeWorker(sessions) {
const worker = new Worker('whatsapp-jobs', async (job) => {
console.log(`Processing job ${job.id} of type ${job.name}`);
// 🔴 ADD THIS: Simulate stuck condition (1% chance)
if (Math.random() < 0.01) {
console.log(`🚨 SIMULATING STUCK JOB: ${job.id} - This job will hang forever`);
// This promise never resolves - simulating the exact stuck condition
await new Promise(() => {});
// Code never reaches here
}
// 🔴 ADD THIS: Simulate event loop blocking (0.5% chance)
if (Math.random() < 0.005) {
console.log(`🚨 SIMULATING BLOCKED EVENT LOOP: ${job.id}`);
// Block the event loop for 5 minutes
const start = Date.now();
while (Date.now() - start < 300000) {
// Busy wait - blocks event loop
}
}
// Your existing code continues...
try {
const { sessionName, to, text, question, options, timestamp } = job.data;
switch (job.name) {
// ... your existing cases
}
} catch (error) {
console.error(`Error processing job ${job.id}:`, error);
throw error;
}
}, {
// Your existing configuration
connection: {
host: process.env.REDIS_HOST || 'localhost',
port: parseInt(process.env.REDIS_PORT) || 6379,
maxRetriesPerRequest: null,
enableReadyCheck: false
},
concurrency: 10,
limiter: { max: 200, duration: 10000 },
stalledInterval: 120000, // Should detect stalls every 2 minutes
maxStalledCount: 3,
lockDuration: 180000, // 3 minute lock
lockRenewTime: 90000, // Renew every 90 seconds
autorun: true
});
// Your existing event handlers...
return worker;
}
<img width="1680" height="978" alt="Image" src="https://github.com/user-attachments/assets/af3e5819-6f6a-47dc-99cd-69a47a51d122" />
### Relevant log output
```shell
I have already shared everything.
<img width="1680" height="978" alt="Image" src="https://github.com/user-attachments/assets/af3e5819-6f6a-47dc-99cd-69a47a51d122" />Code of Conduct
- I agree to follow this project's Code of Conduct