[Bug]: Duplicate job scheduler tasks #2876

noe-charmet · 2024-10-30T23:15:15Z

Version

5.21.2

Platform

NodeJS

What happened?

We use BullMQ Job Scheduler to trigger frequent tasks (multiple per second). We leverage the schedulerId to ensure one task is only defined once. We have however observed that on some occasions tasks are not properly deduplicated and we end up with multiple instances of the repeated task. In our specific case we trigger upsertJobScheduler at startup and have seen this ocure when having crash loops causing many restarts in a short period of time.

How to reproduce.

I have written this JS script which pushes BullMQ to its limits. It goes beyond what we expect from it, but provides a reliable way to reproduce the bug that we are reporting here. Run the script and wait a few dozen seconds, you should see the counter for waiting tasks go up (restarting the script seems to help too).

I've tried deep diving into the BullMQ code and debug logging at various places, but I'm yet unsuccessful in identifying the root cause of this duplication.

When there is no Worker involved, the duplication does not happen. I'm therefore suspecting that it could be triggered when the task is being processed by the worker when the upsert happens.

const {Worker, Queue} = require('bullmq');
const {setTimeout, setInterval} = require('timers/promises');

const connection = {
  host: '127.0.0.1',
  password: 'password',
};

async function run() {
  const queue = new Queue('test', {
    connection,
  });

  await queue.waitUntilReady();

  const worker = new Worker(
    'test',
    async () => {
      await setTimeout(250);
    },
    {connection},
  );

  await worker.waitUntilReady();

  for await (const _ of setInterval(50)) {
    console.log('###');
    await queue.upsertJobScheduler('test', {
      every: 100,
    });
    console.log('active', await queue.getActiveCount());
    console.log('waiting', await queue.getWaitingCount());
    console.log('delayed', await queue.getDelayedCount());
  }
}

run().catch((error) => {
  console.error(error);
  process.exit(1);
});

Relevant log output

No response

Code of Conduct

I agree to follow this project's Code of Conduct

The text was updated successfully, but these errors were encountered:

noe-charmet · 2024-10-31T07:32:57Z

Just tested with the deprecated Repeatable job interface using a key and the issue does not happen under these conditions.

const {Worker, Queue} = require('bullmq');
const {setTimeout, setInterval} = require('timers/promises');

const connection = {
  host: '127.0.0.1',
  password: 'password',
};

async function run() {
  const queue = new Queue('test', {
    connection,
  });

  await queue.waitUntilReady();

  const worker = new Worker(
    'test',
    async () => {
      await setTimeout(250);
    },
    {connection},
  );

  await worker.waitUntilReady();

  for await (const _ of setInterval(50)) {
    console.log('###');
    await queue.add('test', undefined, {
      repeat: {every: 100, key: 'test'},
    });
    console.log('active', await queue.getActiveCount());
    console.log('waiting', await queue.getWaitingCount());
    console.log('delayed', await queue.getDelayedCount());
  }
}

run().catch((error) => {
  console.error(error);
  process.exit(1);
});

paulsc54 · 2024-11-05T12:43:49Z

Hi,
Just a quick update—we’re encountering the same issues after migrating from the repeatable setup. By design, our API can restart regularly, but as @noe-charmet mentioned, if we experience too many restarts in a short timeframe, we end up with multiple delayed jobs for the same repeatable task.

I've attached a screenshot showing a job that frequently duplicates, along with the options for the first two instances below.

{
  "attempts": 5,
  "prevMillis": 1730810589509,
  "removeOnComplete": {
    "age": 600,
    "count": 100
  },
  "removeOnFail": {
    "age": 600,
    "count": 100
  },
  "repeat": {
    "offset": 31500,
    "count": 14,
    "every": 240000
  },
  "jobId": "repeat:rental_partner_confirmation_email:1730810589509",
  "timestamp": 1730810349524,
  "delay": 239985
}

{
  "attempts": 5,
  "prevMillis": 1730810589488,
  "removeOnComplete": {
    "age": 600,
    "count": 100
  },
  "removeOnFail": {
    "age": 600,
    "count": 100
  },
  "repeat": {
    "offset": 36846,
    "count": 114,
    "every": 240000
  },
  "jobId": "repeat:rental_partner_confirmation_email:1730810589488",
  "timestamp": 1730810349510,
  "delay": 239978
}

Zimtente · 2024-11-07T14:36:31Z

Same issue with our setup.

jordan-loeser · 2024-11-08T02:26:52Z

Encountering the same issue here.

manast · 2024-11-10T11:51:21Z

I have a grasp on why this is happening particularly when using very tight repetitions and many distributed calls to upsertJobScheduler. I am designing a solution for this now.

noe-charmet added the bug Something isn't working label Oct 30, 2024

samthemagicman mentioned this issue Nov 9, 2024

[Bug]: Duplicate delayed jobs on repeatable #2889

Open

1 task

manast self-assigned this Nov 10, 2024

manast mentioned this issue Nov 10, 2024

Fix/avoid hazards when upserting job schedulers #2892

Merged

manast closed this as completed in #2892 Nov 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: Duplicate job scheduler tasks #2876

[Bug]: Duplicate job scheduler tasks #2876

noe-charmet commented Oct 30, 2024 •

edited

Loading

noe-charmet commented Oct 31, 2024

paulsc54 commented Nov 5, 2024

Zimtente commented Nov 7, 2024

jordan-loeser commented Nov 8, 2024

manast commented Nov 10, 2024

[Bug]: Duplicate job scheduler tasks #2876

[Bug]: Duplicate job scheduler tasks #2876

Comments

noe-charmet commented Oct 30, 2024 • edited Loading

Version

Platform

What happened?

How to reproduce.

Relevant log output

Code of Conduct

noe-charmet commented Oct 31, 2024

paulsc54 commented Nov 5, 2024

Zimtente commented Nov 7, 2024

jordan-loeser commented Nov 8, 2024

manast commented Nov 10, 2024

noe-charmet commented Oct 30, 2024 •

edited

Loading