Skip to content

Intermittent 'Error: Connection is closed' during shutdown (BullMQ + ioredis) #3546

@sergioaafreitas

Description

@sergioaafreitas

Upstream report: intermittent "Error: Connection is closed" during shutdown (BullMQ + ioredis)

Target: bullmq repository (https://github.com/taskforcesh/bullmq)

Short summary

We observe an intermittent unhandled rejection "Error: Connection is closed." during shutdown in our test runs. The stack points into ioredis internals but the clients involved are created by BullMQ (Queue/Worker/QueueEvents) using a shared ioredis connection. Tests pass but the test runner (Vitest) exits with non-zero status because of the unhandled rejection.

Why we think BullMQ may be involved

  • Our repro creates Queue, Worker and QueueEvents (BullMQ 5.63.0 observed) with a shared ioredis connection and then closes them, then disconnects the connection. The unhandled rejection appears while those resources are being closed.
  • Debug output shows multiple internal ioredis clients with names such as bull:<base> being created and used concurrently.

Repro and logs (attached in our repo)

  • Minimal repro script (in our repo): apps/api/scripts/repro-ioredis-shutdown.js
  • Aggregated repro runs (debug): apps/api/test-results/repro-ioredis-shutdown.log
  • Vitest debug run where the unhandled rejection reproduced: apps/api/test-results/vitest-debug.log

Observed environment (from debug traces)

  • BullMQ: 5.63.0
  • ioredis: 5.8.2
  • Node.js: v22.x
  • OS: Linux

What we observe

  • Intermittent unhandled rejection with stack trace inside ioredis event_handler.js when closing connections. Example stack snippet:
Unhandled Rejection
Error: Connection is closed.
 ❯ close .../node_modules/ioredis/built/redis/event_handler.js:214:25
 ❯ Socket.<anonymous> .../node_modules/ioredis/built/redis/event_handler.js:181:20

What we've tried in our codebase

  • Attach defensive client.on('error') handlers to the explicit connection and to discovered internal clients created by BullMQ.
  • Track and remove those handlers on shutdown.
  • Prefer connection.disconnect() during shutdown (avoid QUIT writes), and call it after closing worker/events/queue.
  • Add a test-only Vitest unhandledRejection swallow as a temporary mitigation while investigating.

Notes and hypotheses

  • The failure looks timing sensitive: our minimal repro run 20x did not reproduce the error, but a full Vitest run with DEBUG logs reproduced it once.
  • Possible root causes:
    • BullMQ may be issuing Redis commands (internal clients) after the shared connection is being closed.
    • ioredis may be emitting an error from a low-level handler that isn't being routed to the attached error listeners in time.

Request / suggested next steps for maintainers

  1. Review the attached repro script and logs and try running the repro under a test runner to exercise the same timing (Vitest/Jest) — it may be intermittent and time-sensitive.
  2. Look at internal client lifecycle in BullMQ: consider whether internal clients can do late writes while the shared connection is being disconnected, or whether a more explicit shutdown order is needed.
  3. If helpful, we can try a small patch in our repo to force explicit disconnect() calls on internal clients before connection.disconnect() and report back.

If maintainers prefer the issue to be opened on ioredis instead, we can move the repro there — please advise which repo is the right owner for this race.

-- repo: sergioaafreitas/octaanalysis
-- repro script: apps/api/scripts/repro-ioredis-shutdown.js
-- logs: apps/api/test-results/repro-ioredis-shutdown.log and apps/api/test-results/vitest-debug.log

Latest attempts (local mitigation added in repo)

  • We added an aggressive local mitigation in apps/api/src/queue/queue.service.ts that:
    • discovers internal ioredis clients created by BullMQ and attaches error listeners,
    • and on shutdown explicitly calls disconnect() on those discovered internal clients and removes listeners before disconnecting the shared connection.
  • After applying this mitigation we ran a single Vitest run with debug enabled and saved the output to apps/api/test-results/vitest-after-mitigation.log. The unhandled rejection still appeared in that run and Vitest exited with non-zero.

Attachments in the repo now include:

  • apps/api/test-results/vitest-after-mitigation.log (Vitest run after local mitigation)

We can open the issue now and attach the three logs (repro loop, vitest-debug, vitest-after-mitigation) to help triage.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions