Skip to content

BullMQ Worker Not Processing Jobs  #21

@frendhisaido

Description

@frendhisaido

I'm facing an issue with a BullMQ setup across Cloud Run and GKE, and I'm hoping someone here might have encountered a similar scenario.
My Setup:

  • BullMQ Proxy: Running as a GCP Cloud Run service using bullmq-proxy latest docker image.
  • BullMQ Worker: Deployed as a Kubernetes workload in GKE (Dockerized FROM oven/bun:1.0.25, bun install, then bun worker.js).
  • Redis: Running on a GCE instance, shared between the proxy and worker.
  • Bun 1.0.5
  • redis:7.2.1
  • bullmq : "5.7.15"
  • bullmq-proxy v1.3.0

The Problem:

While the proxy successfully accepts incoming jobs (confirmed via redis-cli and debugging via QueueEvents), the BullMQ worker doesn't seem to pick them up. This happens both when the worker runs as a GKE workload and when I run it manually via bun worker.js.
Has anyone successfully deployed a similar setup with BullMQ-proxy on Cloud Run and a BullMQ worker in GKE? Any insights or potential pitfalls to watch out for would be greatly appreciated!

The worker code

import { Worker } from 'bullmq';
const admin = require('firebase-admin');
import serviceAccount from './secrets/ggl-app-prod.json';
const redis = require("redis");
import IORedis from 'ioredis';

const REDIS_URL = process.env.REDIS_URL || 'redis://172.20.0.1:6789/2';
const connection = new IORedis(REDIS_URL, {
    maxRetriesPerRequest: null
  });// BullMQ requirements: https://docs.bullmq.io/guide/going-to-production#maxretriesperrequest
const limiter = {
    max: process.env.MAX_JOB || 20,
    duration: 1000,
};

admin.initializeApp({
    credential: admin.credential.cert(serviceAccount)
});

let redisCacheClient;
const CACHE_EXPIRY_TIME = ((60 * 60) * 12) * 1;

(async () => {
    redisCacheClient = redis.createClient({
        url: REDIS_URL
    });
    redisCacheClient.on("error", (error) => console.error(`Error : ${error}`));
    await redisCacheClient.connect();
})();

console.log('FCM Worker Started', Date());

const worker = new Worker('groupchat_msg_fcm', async job => {
    console.log(job); // <-- no output
    const token = job.data.token
    const isTokenInvalid = await redisCacheClient.get("invalid" + token);
    // console.log(isTokenInvalid, job.data);
    if (job.name === 'msg' && !isTokenInvalid) {
        await sendFCMNotification(token, job.data.data, job.data.data.xfrom);
    }
}, { connection, limiter });
// console.log(REDIS_URL, worker.qualifiedName); // <- theREDIS_URL, bull:groupchat_msg_fcm

async function sendFCMNotification(token, data, senderTinodeUserId) {
// sending fcm...
}

The strange thing is that we've successfully run this same setup in easypanel, where the bullmq-worker, bullmq-proxy, and Redis all run on the same host—almost like managing them with a single docker-compose.yml file. In that environment, the same worker code consistently picks up jobs without a hitch. We even scaled up to two worker replicas in easypanel, and everything continued to function smoothly.

We could continue using easypanel, but it's not a viable option for the long run. We're just so curious to understand why we're facing this issue with Cloud Run and GKE. I'm hoping it's just a silly configuration oversight on my part, and we can get this resolved quickly and move on with our lives! 😄

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions