-
Notifications
You must be signed in to change notification settings - Fork 518
Description
Upstream report: intermittent "Error: Connection is closed" during shutdown (BullMQ + ioredis)
Target: bullmq repository (https://github.com/taskforcesh/bullmq)
Short summary
We observe an intermittent unhandled rejection "Error: Connection is closed." during shutdown in our test runs. The stack points into ioredis internals but the clients involved are created by BullMQ (Queue/Worker/QueueEvents) using a shared ioredis connection. Tests pass but the test runner (Vitest) exits with non-zero status because of the unhandled rejection.
Why we think BullMQ may be involved
- Our repro creates
Queue,WorkerandQueueEvents(BullMQ 5.63.0 observed) with a shared ioredis connection and then closes them, then disconnects the connection. The unhandled rejection appears while those resources are being closed. - Debug output shows multiple internal ioredis clients with names such as
bull:<base>being created and used concurrently.
Repro and logs (attached in our repo)
- Minimal repro script (in our repo):
apps/api/scripts/repro-ioredis-shutdown.js - Aggregated repro runs (debug):
apps/api/test-results/repro-ioredis-shutdown.log - Vitest debug run where the unhandled rejection reproduced:
apps/api/test-results/vitest-debug.log
Observed environment (from debug traces)
- BullMQ: 5.63.0
- ioredis: 5.8.2
- Node.js: v22.x
- OS: Linux
What we observe
- Intermittent unhandled rejection with stack trace inside ioredis
event_handler.jswhen closing connections. Example stack snippet:
Unhandled Rejection
Error: Connection is closed.
❯ close .../node_modules/ioredis/built/redis/event_handler.js:214:25
❯ Socket.<anonymous> .../node_modules/ioredis/built/redis/event_handler.js:181:20
What we've tried in our codebase
- Attach defensive
client.on('error')handlers to the explicit connection and to discovered internal clients created by BullMQ. - Track and remove those handlers on shutdown.
- Prefer
connection.disconnect()during shutdown (avoid QUIT writes), and call it after closing worker/events/queue. - Add a test-only Vitest
unhandledRejectionswallow as a temporary mitigation while investigating.
Notes and hypotheses
- The failure looks timing sensitive: our minimal repro run 20x did not reproduce the error, but a full Vitest run with DEBUG logs reproduced it once.
- Possible root causes:
- BullMQ may be issuing Redis commands (internal clients) after the shared connection is being closed.
- ioredis may be emitting an error from a low-level handler that isn't being routed to the attached
errorlisteners in time.
Request / suggested next steps for maintainers
- Review the attached repro script and logs and try running the repro under a test runner to exercise the same timing (Vitest/Jest) — it may be intermittent and time-sensitive.
- Look at internal client lifecycle in BullMQ: consider whether internal clients can do late writes while the shared connection is being disconnected, or whether a more explicit shutdown order is needed.
- If helpful, we can try a small patch in our repo to force explicit
disconnect()calls on internal clients beforeconnection.disconnect()and report back.
If maintainers prefer the issue to be opened on ioredis instead, we can move the repro there — please advise which repo is the right owner for this race.
-- repo: sergioaafreitas/octaanalysis
-- repro script: apps/api/scripts/repro-ioredis-shutdown.js
-- logs: apps/api/test-results/repro-ioredis-shutdown.log and apps/api/test-results/vitest-debug.log
Latest attempts (local mitigation added in repo)
- We added an aggressive local mitigation in
apps/api/src/queue/queue.service.tsthat:- discovers internal ioredis clients created by BullMQ and attaches error listeners,
- and on shutdown explicitly calls
disconnect()on those discovered internal clients and removes listeners before disconnecting the shared connection.
- After applying this mitigation we ran a single Vitest run with debug enabled and saved the output to
apps/api/test-results/vitest-after-mitigation.log. The unhandled rejection still appeared in that run and Vitest exited with non-zero.
Attachments in the repo now include:
apps/api/test-results/vitest-after-mitigation.log(Vitest run after local mitigation)
We can open the issue now and attach the three logs (repro loop, vitest-debug, vitest-after-mitigation) to help triage.