Skip to content

[Bug]: QUeues get stuck in Active #3535

@tariqbilal

Description

@tariqbilal

Version

v5.54.0

Platform

NodeJS

What happened?

Title: Jobs randomly get stuck in "active" state not even working after restart and fails.

Description:
Jobs in our BullMQ queues periodically get stuck in the "active" state without being processed or moved to failed state. This affects only specific queues while others continue working normally. There is no resolution I have found yet.

Environment:

  • BullMQ Version: [v5.54.0]
  • Node.js Version: [22]
  • Redis Version: [redis-cli 7.0.15]
  • Operating System: Ubuntu 24
  • Deployment: VM

Reproduction Steps:

  1. Start a BullMQ worker processing jobs normally
  2. After several hours/days of operation, observe that:
    • New jobs enter the queue
    • Jobs are picked up and move to "active" state
    • Jobs remain stuck in "active" state indefinitely
    • No progress is made on stuck jobs
    • No error events are emitted

Expected Behavior:

  • Jobs should either complete processing and move to "completed" state
  • Or jobs should fail and move to "failed" state with appropriate error
  • Stalled jobs should be automatically detected and restarted based on configuration

Actual Behavior:

  • Jobs remain stuck in "active" state indefinitely
  • No automatic stall detection occurs despite stalledInterval configuration
  • No error events or logs are generated for stuck jobs

Configuration:

const queue = new Queue('affected-queue', {
  connection: redisConfig,
  settings: {
    stalledInterval: 30000,
    maxStalledCount: 2,
  },
  defaultJobOptions: {
    attempts: 3,
    backoff: {
      type: 'exponential',
      delay: 1000,
    },
    removeOnComplete: 100,
    removeOnFail: 100,
  },
});

### How to reproduce.

function initializeWorker(sessions) {
    const worker = new Worker('whatsapp-jobs', async (job) => {
        console.log(`Processing job ${job.id} of type ${job.name}`);
        
        // 🔴 ADD THIS: Simulate stuck condition (1% chance)
        if (Math.random() < 0.01) {
            console.log(`🚨 SIMULATING STUCK JOB: ${job.id} - This job will hang forever`);
            // This promise never resolves - simulating the exact stuck condition
            await new Promise(() => {});
            // Code never reaches here
        }
        
        // 🔴 ADD THIS: Simulate event loop blocking (0.5% chance)
        if (Math.random() < 0.005) {
            console.log(`🚨 SIMULATING BLOCKED EVENT LOOP: ${job.id}`);
            // Block the event loop for 5 minutes
            const start = Date.now();
            while (Date.now() - start < 300000) {
                // Busy wait - blocks event loop
            }
        }
        
        // Your existing code continues...
        try {
            const { sessionName, to, text, question, options, timestamp } = job.data;
            
            switch (job.name) {
                // ... your existing cases
            }
        } catch (error) {
            console.error(`Error processing job ${job.id}:`, error);
            throw error;
        }
    }, {
        // Your existing configuration
        connection: {
            host: process.env.REDIS_HOST || 'localhost',
            port: parseInt(process.env.REDIS_PORT) || 6379,
            maxRetriesPerRequest: null,
            enableReadyCheck: false
        },
        concurrency: 10,
        limiter: { max: 200, duration: 10000 },
        stalledInterval: 120000, // Should detect stalls every 2 minutes
        maxStalledCount: 3,
        lockDuration: 180000, // 3 minute lock
        lockRenewTime: 90000, // Renew every 90 seconds
        autorun: true
    });

    // Your existing event handlers...
    return worker;
}

<img width="1680" height="978" alt="Image" src="https://github.com/user-attachments/assets/af3e5819-6f6a-47dc-99cd-69a47a51d122" />

### Relevant log output

```shell
I have already shared everything.
<img width="1680" height="978" alt="Image" src="https://github.com/user-attachments/assets/af3e5819-6f6a-47dc-99cd-69a47a51d122" />

Code of Conduct

  • I agree to follow this project's Code of Conduct

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions