Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Duplicate job scheduler tasks #2876

Closed
1 task done
noe-charmet opened this issue Oct 30, 2024 · 5 comments · Fixed by #2892
Closed
1 task done

[Bug]: Duplicate job scheduler tasks #2876

noe-charmet opened this issue Oct 30, 2024 · 5 comments · Fixed by #2892
Assignees
Labels
bug Something isn't working

Comments

@noe-charmet
Copy link

noe-charmet commented Oct 30, 2024

Version

5.21.2

Platform

NodeJS

What happened?

We use BullMQ Job Scheduler to trigger frequent tasks (multiple per second). We leverage the schedulerId to ensure one task is only defined once. We have however observed that on some occasions tasks are not properly deduplicated and we end up with multiple instances of the repeated task. In our specific case we trigger upsertJobScheduler at startup and have seen this ocure when having crash loops causing many restarts in a short period of time.

How to reproduce.

I have written this JS script which pushes BullMQ to its limits. It goes beyond what we expect from it, but provides a reliable way to reproduce the bug that we are reporting here. Run the script and wait a few dozen seconds, you should see the counter for waiting tasks go up (restarting the script seems to help too).

I've tried deep diving into the BullMQ code and debug logging at various places, but I'm yet unsuccessful in identifying the root cause of this duplication.

When there is no Worker involved, the duplication does not happen. I'm therefore suspecting that it could be triggered when the task is being processed by the worker when the upsert happens.

const {Worker, Queue} = require('bullmq');
const {setTimeout, setInterval} = require('timers/promises');

const connection = {
  host: '127.0.0.1',
  password: 'password',
};

async function run() {
  const queue = new Queue('test', {
    connection,
  });

  await queue.waitUntilReady();

  const worker = new Worker(
    'test',
    async () => {
      await setTimeout(250);
    },
    {connection},
  );

  await worker.waitUntilReady();

  for await (const _ of setInterval(50)) {
    console.log('###');
    await queue.upsertJobScheduler('test', {
      every: 100,
    });
    console.log('active', await queue.getActiveCount());
    console.log('waiting', await queue.getWaitingCount());
    console.log('delayed', await queue.getDelayedCount());
  }
}

run().catch((error) => {
  console.error(error);
  process.exit(1);
});

Relevant log output

No response

Code of Conduct

  • I agree to follow this project's Code of Conduct
@noe-charmet noe-charmet added the bug Something isn't working label Oct 30, 2024
@noe-charmet
Copy link
Author

Just tested with the deprecated Repeatable job interface using a key and the issue does not happen under these conditions.

const {Worker, Queue} = require('bullmq');
const {setTimeout, setInterval} = require('timers/promises');

const connection = {
  host: '127.0.0.1',
  password: 'password',
};

async function run() {
  const queue = new Queue('test', {
    connection,
  });

  await queue.waitUntilReady();

  const worker = new Worker(
    'test',
    async () => {
      await setTimeout(250);
    },
    {connection},
  );

  await worker.waitUntilReady();

  for await (const _ of setInterval(50)) {
    console.log('###');
    await queue.add('test', undefined, {
      repeat: {every: 100, key: 'test'},
    });
    console.log('active', await queue.getActiveCount());
    console.log('waiting', await queue.getWaitingCount());
    console.log('delayed', await queue.getDelayedCount());
  }
}

run().catch((error) => {
  console.error(error);
  process.exit(1);
});

@paulsc54
Copy link

paulsc54 commented Nov 5, 2024

Hi,
Just a quick update—we’re encountering the same issues after migrating from the repeatable setup. By design, our API can restart regularly, but as @noe-charmet mentioned, if we experience too many restarts in a short timeframe, we end up with multiple delayed jobs for the same repeatable task.

I've attached a screenshot showing a job that frequently duplicates, along with the options for the first two instances below.
Screenshot 2024-11-05 at 13 40 33

{
  "attempts": 5,
  "prevMillis": 1730810589509,
  "removeOnComplete": {
    "age": 600,
    "count": 100
  },
  "removeOnFail": {
    "age": 600,
    "count": 100
  },
  "repeat": {
    "offset": 31500,
    "count": 14,
    "every": 240000
  },
  "jobId": "repeat:rental_partner_confirmation_email:1730810589509",
  "timestamp": 1730810349524,
  "delay": 239985
}

{
  "attempts": 5,
  "prevMillis": 1730810589488,
  "removeOnComplete": {
    "age": 600,
    "count": 100
  },
  "removeOnFail": {
    "age": 600,
    "count": 100
  },
  "repeat": {
    "offset": 36846,
    "count": 114,
    "every": 240000
  },
  "jobId": "repeat:rental_partner_confirmation_email:1730810589488",
  "timestamp": 1730810349510,
  "delay": 239978
}

@Zimtente
Copy link

Zimtente commented Nov 7, 2024

Same issue with our setup.

@jordan-loeser
Copy link

Encountering the same issue here.

@manast
Copy link
Contributor

manast commented Nov 10, 2024

I have a grasp on why this is happening particularly when using very tight repetitions and many distributed calls to upsertJobScheduler. I am designing a solution for this now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants