Skip to content

Conversation

@adrien2p
Copy link
Member

@adrien2p adrien2p commented Jan 15, 2026

Summary

This PR consolidates the workflow engine into a unified provider-based architecture, aligning it with the pattern used by other Medusa modules (payments, notifications, etc.). The new architecture provides a single @medusajs/workflows module with pluggable storage providers, replacing the previous separate workflow-engine-inmemory and workflow-engine-redis modules.

Motivation

  • Consistency: Align workflow engine with Medusa's standard module/provider pattern
  • Flexibility: Allow custom storage provider implementations
  • Maintainability: Reduce code duplication between in-memory and Redis implementations
  • Scalability: Add distributed notification support for multi-instance deployments

Changes

New Packages

Package Description
@medusajs/workflows Unified workflow engine module with local (in-memory) storage by default
@medusajs/workflows-redis Redis storage provider for distributed workflow execution

Architecture

Before:
├── @medusajs/workflow-engine-inmemory (standalone module)
└── @medusajs/workflow-engine-redis (standalone module)

After:
├── @medusajs/workflows (unified module)
│ ├── WorkflowsModuleService (implements IWorkflowEngineService)
│ ├── WorkflowOrchestratorService (workflow execution)
│ └── LocalWorkflowsStorage (default in-memory provider)
└── @medusajs/workflows-redis (optional provider)
└── RedisWorkflowsStorage (distributed storage)

Key Features

  1. Provider-based Storage

    • Default: LocalWorkflowsStorage (in-memory, single-instance)
    • Optional: RedisWorkflowsStorage (distributed, multi-instance)
  2. Distributed Notifications (new)

    • Cross-instance event propagation for async workflows
    • Instance-based deduplication to prevent duplicate processing
    • Redis pub/sub for real-time notifications
  3. Shared Utilities

    • saveToDb(), deleteFromDb(), clearExpiredExecutions()
    • preventRaceConditionExecutionIfNecessary() for concurrent execution handling
  4. Enhanced Type Definitions

    • New IWorkflowModuleOrchestratorService interface
    • DistributedNotificationSubscriber for cross-instance communication
    • DistributedStorageHooks for lifecycle management

Configuration

Using Default (In-Memory) Storage

// medusa-config.ts                                                                                                                                   
import { defineConfig } from "@medusajs/framework/utils"                                                                                              
                                                                                                                                                      
export default defineConfig({                                                                                                                         
  // No configuration needed - uses in-memory storage by default                                                                                      
})                                                                                                                                                    
                                                                                                                                                      
Using Redis Storage                                                                                                                                   
                                                                                                                                                      
// medusa-config.ts                                                                                                                                   
import { defineConfig, Modules } from "@medusajs/framework/utils"                                                                                     
                                                                                                                                                      
export default defineConfig({                                                                                                                         
  modules: [                                                                                                                                          
    {                                                                                                                                                 
      resolve: "@medusajs/medusa/workflows",                                                                                                          
      options: {                                                                                                                                      
        providers: [                                                                                                                                  
          {                                                                                                                                           
            resolve: "@medusajs/medusa/workflows-redis",                                                                                              
            id: "workflows-redis",                                                                                                                    
            options: {                                                                                                                                
              redisUrl: process.env.REDIS_URL,                                                                                                                                                                                                         
            },                                                                                                                                        
          },                                                                                                                                          
        ],                                                                                                                                            
      },                                                                                                                                              
    },                                                                                                                                                
  ],                                                                                                                                                  
})          

Note

Engine consolidation to provider architecture

  • Introduces unified @medusajs/workflows module and new @medusajs/workflows-redis provider (RedisWorkflowsStorage, loaders, config options) with pub/sub notifications and lifecycle hooks
  • Extends orchestration/types: new IWorkflowModuleOrchestratorService, storage hooks, notification subscriber, and scheduler/timeout/retry support
  • Medusa exports/back-compat: add @medusajs/medusa/workflows and workflows-redis, deprecate old workflow-engine-*, and map legacy package names
  • Improves error/log output (include stack in loader errors; consistent logger messages) and fixes filter bug using filters_
  • Updates test scripts and adds extensive integration tests for workflows and Redis provider (timeouts, retries with intervals, race conditions, scheduling, admin endpoints)

Written by Cursor Bugbot for commit 236b6d5. This will update automatically on new commits. Configure here.

@changeset-bot
Copy link

changeset-bot bot commented Jan 15, 2026

⚠️ No Changeset found

Latest commit: 236b6d5

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

@vercel
Copy link

vercel bot commented Jan 15, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

8 Skipped Deployments
Project Deployment Review Updated (UTC)
api-reference Ignored Ignored Jan 21, 2026 2:07pm
api-reference-v2 Ignored Ignored Preview Jan 21, 2026 2:07pm
cloud-docs Ignored Ignored Preview Jan 21, 2026 2:07pm
docs-ui Ignored Ignored Preview Jan 21, 2026 2:07pm
docs-v2 Ignored Ignored Preview Jan 21, 2026 2:07pm
medusa-docs Ignored Ignored Preview Jan 21, 2026 2:07pm
resources-docs Ignored Ignored Preview Jan 21, 2026 2:07pm
user-guide Ignored Ignored Preview Jan 21, 2026 2:07pm

Request Review

@adrien2p
Copy link
Member Author

@claude review this be thourough please

@claude
Copy link

claude bot commented Jan 21, 2026

PR Review: Workflow Engine Provider Architecture Consolidation (#14547)

Overview

This PR consolidates the workflow engine from separate in-memory and Redis modules into a unified provider architecture. This is a significant architectural improvement that modernizes the workflow engine to follow Medusa's provider pattern.

Key Changes:

  • New @medusajs/workflows module with pluggable storage providers
  • New @medusajs/workflows-redis provider package
  • Deprecated workflow-engine-inmemory and workflow-engine-redis modules with migration paths
  • 10,381 additions across 99 files

Code Quality Assessment

✅ Strengths

  1. Well-Structured Provider Architecture

    • Clean separation between core workflow logic and storage implementations
    • Provider interface abstraction allows swapping backends without code changes
    • Follows Medusa's established patterns for provider-based modules
  2. Backward Compatibility

    • Deprecated modules now proxy to new implementation with console warnings
    • Clear migration instructions in deprecation comments (workflow-engine-redis.ts:4-22)
    • Module resolution backward compatibility maintained (definition.ts:94-97)
  3. Comprehensive Test Coverage

    • ~78 test cases covering retry mechanisms, race conditions, timeouts, compensation
    • Strong integration test suite for both local and Redis providers
    • Good edge case coverage for concurrent operations
  4. Lifecycle Management

    • Proper hooks: onApplicationStart, onApplicationPrepareShutdown, onApplicationShutdown
    • Two-phase graceful shutdown in Redis implementation
    • Resource cleanup for timers, workers, and connections
  5. Adherence to Medusa Patterns

    • Proper use of decorators (@InjectManager, @MedusaContext)
    • Service inheritance from MedusaService
    • Follows naming conventions and file structure

⚠️ Issues & Concerns

1. Critical: Deep Cloning with JSON.stringify Loses Data

Location: Multiple files (e.g., workflows-module.ts:147, 221, 232)

const options_ = JSON.parse(JSON.stringify(options ?? {}))

Problem:

  • Loses function references (breaks event handlers if passed in options)
  • Converts Date objects to strings
  • Removes undefined values
  • Fails on circular references
  • Strips class prototypes

Impact: High - Could cause subtle bugs if workflow options contain complex objects

Recommendation: Use a proper deep clone utility like structuredClone() or lodash's cloneDeep(), or better yet, use a shallow clone with explicit field filtering:

const { manager, transactionManager, ...options_ } = options ?? {}

2. Critical: Silent Promise Rejection in Redis Operations

Location: workflow-orchestrator.ts:675, 754

void this.redisSubscriber.subscribe(this.getChannelName(workflowId))
void this.redisSubscriber.unsubscribe(this.getChannelName(workflowId))

Problem:

  • Subscription failures are silently ignored
  • Could lead to memory leaks if subscribe fails but subscriber is added to Map
  • Unsubscribe failures could cause unnecessary Redis memory usage

Impact: High - Can cause production issues that are hard to diagnose

Recommendation: At minimum, add error logging:

this.redisSubscriber.subscribe(this.getChannelName(workflowId)).catch(err => {
  this.#logger.error(`Failed to subscribe to workflow channel: ${err}`)
})

3. Medium: Inconsistent Error Types

Pattern: Mix of MedusaError and generic Error usage

Examples:

  • Entry points use MedusaError: workflow-orchestrator.ts:196-199, 210-213
  • Internal methods use generic Error: workflow-orchestrator.ts:289, 294, 326
  • Lock failures use generic Error: redis-workflows-storage.ts:543

Impact: Medium - Breaks error handling conventions, makes error classification inconsistent

Recommendation: Standardize on MedusaError throughout for consistency:

// Instead of:
throw new Error("Transaction storage is locked")

// Use:
throw new MedusaError(
  MedusaError.Types.CONFLICT,
  "Transaction storage is locked"
)

4. Medium: Static Subscriber Storage Memory Leak Risk

Location: workflow-orchestrator.ts:103 (in-memory), workflow-orchestrator.ts:118 (Redis)

private static subscribers: Subscribers = new Map()

Problem:

  • Workflow-level subscribers (using AnySubscriber key) are never automatically cleaned
  • Only transaction-specific subscribers are deleted on workflow finish
  • Unbounded memory growth if workflows are dynamically registered

Impact: Medium - Memory leak in long-running applications with many workflows

Recommendation: Add a cleanup mechanism for stale subscribers or document that callers must unsubscribe:

// Add to documentation
/**
 * Note: Subscribers must be explicitly unsubscribed to prevent memory leaks.
 * Transaction-scoped subscribers are automatically cleaned on workflow completion.
 */

5. Medium: Weak Distributed Lock Implementation

Location: redis-workflows-storage.ts:833-853

async #acquireLock(key: string, ttlSeconds: number = 2): Promise<boolean> {
  const lockKey = this.#getLockKey(key)
  const result = await this.redisClient.set(lockKey, 1, "EX", ttlSeconds, "NX")
  return result === "OK"
}

async #releaseLock(key: string): Promise<void> {
  await this.redisClient.del(this.#getLockKey(key))
}

Problems:

  • No lock ownership tracking (any instance can release any lock)
  • Hard-coded 2-second TTL may be insufficient for slow operations
  • No lock extension for long-running saves
  • Delete without checking if lock is owned by current instance

Impact: Medium - Could cause race conditions in distributed deployments under high load

Recommendation: Use a proper distributed lock pattern with unique identifiers:

async #acquireLock(key: string, ttlSeconds: number = 2): Promise<string | null> {
  const lockKey = this.#getLockKey(key)
  const lockValue = ulid() // Unique per acquisition
  const result = await this.redisClient.set(lockKey, lockValue, "EX", ttlSeconds, "NX")
  return result === "OK" ? lockValue : null
}

async #releaseLock(key: string, lockValue: string): Promise<void> {
  // Only delete if we own the lock (Lua script for atomicity)
  await this.redisClient.eval(
    'if redis.call("get", KEYS[1]) == ARGV[1] then return redis.call("del", KEYS[1]) else return 0 end',
    1,
    this.#getLockKey(key),
    lockValue
  )
}

6. Low: Empty Catch Block Swallows Errors

Location: workflow-orchestrator-storage.ts:133 (in-memory)

this.clearTimeout_ = setInterval(async () => {
  try {
    await this.clearExpiredExecutions()
  } catch {}
}, THIRTY_MINUTES_IN_MS)

Problem: Silent failures make debugging difficult

Recommendation: Log errors:

catch (error) {
  this.#logger.error(`Failed to clear expired executions: ${error}`)
}

7. Low: TypeScript @ts-ignore Usage

Location: workflows-module.ts:58

// @ts-ignore
super(...arguments)

Problem: Bypasses type safety

Recommendation: Use proper typing:

super(arguments[0])

8. Low: Missing Error Handling in Connection Factory

Location: redis-loader.ts:144-155

async function getConnection(url, redisOptions) {
  const connection = new Redis(url, {
    lazyConnect: true,
    ...(redisOptions ?? {}),
  })

  await new Promise(async (resolve) => {
    await connection.connect(resolve)
  })

  return connection
}

Problems:

  • No error handling for connection failures
  • Promise doesn't have reject handler
  • async is unnecessary in Promise constructor

Recommendation:

async function getConnection(url, redisOptions) {
  const connection = new Redis(url, {
    lazyConnect: true,
    ...(redisOptions ?? {}),
  })

  await connection.connect()
  return connection
}

Performance Considerations

✅ Good Practices

  1. Selective Database Persistence - Only saves checkpoints at key points, reducing DB writes
  2. Parallel Operations - Uses promiseAll for concurrent DB/Redis operations (redis-workflows-storage.ts:619-627)
  3. Pipeline Usage - Redis pipeline for atomic multi-operations (redis-workflows-storage.ts:580)
  4. Background Execution - Uses setImmediate for non-blocking notifications

⚠️ Potential Issues

  1. Deep Cloning Performance - JSON.parse(JSON.stringify()) is expensive for large objects
  2. No Connection Pooling - Creates multiple Redis connections (main, worker, pub/sub) without apparent pooling
  3. Unbounded Subscriber Growth - Static Map can grow indefinitely

Security Concerns

Medium Priority

  1. No Input Validation on Subscriber Functions

    • Arbitrary functions can be registered as subscribers
    • If subscriber registration is exposed via API, could enable code injection
    • Recommendation: Document that subscriber registration must be internal-only
  2. Predictable Lock Keys

    • Lock keys are just ${transactionId}:lock
    • Could enable lock manipulation if transaction IDs are guessable
    • Recommendation: Use cryptographically random lock tokens
  3. Context Serialization

    • Workflow context stored in DB may contain sensitive data
    • No field-level encryption
    • Recommendation: Document sensitive data handling requirements

Low Priority

  1. Redis TLS Not Enforced - Accepts any RedisOptions without validation
  2. No Pub/Sub Channel Authentication - Channels aren't namespaced by tenant

Test Coverage Assessment

✅ Well Covered

  • Retry mechanisms (manual, auto, interval-based)
  • Race condition prevention
  • Timeout handling (step and transaction)
  • Compensation flows
  • Idempotency and retention
  • Concurrent execution prevention

⚠️ Coverage Gaps

  1. Infrastructure Failures

    • No tests for Redis connection loss during workflow execution
    • No tests for database transaction rollback
    • No tests for BullMQ worker crashes
  2. Resource Limits

    • No tests for maximum parallel step limits
    • No tests for workflow execution queue saturation
    • No tests for large payload handling (100KB+ step responses)
  3. API Error Handling

    • HTTP tests don't cover 400/500 error responses
    • No tests for malformed payloads
    • Missing tests for unauthorized access
  4. Performance Boundaries

    • No tests for workflows with 100+ steps
    • No tests for 1000+ concurrent executions
    • No load tests for subscription callback latency

Migration Path Quality

✅ Excellent Migration Support

  1. Clear Deprecation Warnings - Console warnings with migration instructions
  2. Backward Compatible Imports - Old imports still work
  3. Configuration Migration - define-config.ts:274-285 shows new provider setup
  4. Documentation in Code - Deprecation comments include full migration examples

Suggestion

Consider adding a migration guide document that covers:

  • Breaking changes (if any)
  • Configuration migration examples
  • Performance implications of each provider
  • Troubleshooting common issues

Recommendations Summary

Must Fix (Before Merge)

  1. ✅ Replace JSON.parse(JSON.stringify()) with proper cloning or field filtering
  2. ✅ Add error handling for Redis subscribe/unsubscribe operations
  3. ✅ Standardize error types to use MedusaError consistently

Should Fix (Before Merge)

  1. ✅ Improve distributed lock implementation with ownership tracking
  2. ✅ Fix empty catch block in cleanup interval
  3. ✅ Remove @ts-ignore with proper typing

Consider for Future

  1. Document subscriber memory management requirements
  2. Add infrastructure failure test coverage
  3. Implement metrics/observability
  4. Add migration guide documentation

Conclusion

This is a well-architected refactoring that successfully consolidates the workflow engine into a modern provider pattern. The code follows Medusa conventions, has good test coverage, and includes thoughtful backward compatibility.

The main concerns are around error handling consistency, resource cleanup documentation, and distributed locking robustness. These are fixable issues that don't undermine the overall solid architecture.

Recommendation: Approve with requested changes for the critical items (1-3) before merging.


Additional Notes

  • The deprecation strategy is excellent and user-friendly
  • Test coverage is comprehensive for core functionality
  • Performance optimizations (selective persistence, pipelines) show thoughtful design
  • The two-phase shutdown in Redis is a nice touch for graceful degradation

Great work on this consolidation! 🎉

@adrien2p
Copy link
Member Author

@claude review this with the same approach as the previous review please

@claude
Copy link

claude bot commented Jan 21, 2026

Code Review: Workflow Engine Consolidation to Provider Architecture

Overview

This is a significant architectural refactoring that consolidates the workflow engine into a unified module with pluggable storage providers. The PR successfully aligns the workflow engine with Medusa's standard module/provider pattern used by payments, notifications, and other modules.


✅ Strengths

1. Architecture & Design

  • Excellent provider pattern implementation: The separation between @medusajs/workflows (core module) and @medusajs/workflows-redis (provider) is clean and follows Medusa conventions
  • Strong abstraction: The IDistributedTransactionStorage and IDistributedSchedulerStorage interfaces provide clear contracts for storage providers
  • Distributed notification system: The new pub/sub notification system (DistributedNotificationSubscriber) elegantly handles cross-instance communication for async workflows

2. Code Quality

  • Well-structured services:
    • WorkflowsModuleService properly extends MedusaService and uses appropriate decorators (@InjectManager, @InjectSharedContext)
    • WorkflowOrchestratorService cleanly separates orchestration logic from storage concerns
  • Shared utilities: The workflow-storage-utils.ts consolidates common logic (saveToDb, deleteFromDb, race condition prevention) - excellent DRY principle
  • Lifecycle hooks: Proper implementation of __hooks for application lifecycle management (startup, shutdown)

3. Error Handling

  • Appropriate error types: Consistent use of MedusaError with proper error types (NOT_FOUND, INVALID_DATA, CONFLICT)
  • Race condition handling: The preventRaceConditionExecutionIfNecessary function with skip errors (SkipExecutionError, SkipCancelledExecutionError) is well-designed
  • Distributed lock implementation: Redis provider's lock acquisition/release using ULID and Lua scripts is correct

4. Test Coverage

  • Comprehensive integration tests: ~4000 lines of tests covering:
    • Race conditions
    • Retry intervals
    • Subscribe/notify patterns
    • Transaction timeouts
    • Scheduled workflows
  • Tests exist for both providers: Local and Redis implementations are both tested

5. Backward Compatibility

  • Deprecation warnings: Old modules (workflow-engine-inmemory, workflow-engine-redis) properly deprecated with console warnings
  • Migration path: Clear upgrade path from old modules to new unified architecture
  • Default behavior: Falls back to in-memory storage when no provider configured (lines 58-68 in providers.ts)

🔍 Issues & Recommendations

Critical Issues

1. Memory Leak Risk in Subscriber Management (workflow-orchestrator.ts:673-733)

Location: packages/modules/workflows/src/services/workflow-orchestrator.ts:673-733

The subscribe method has a warning comment about memory leaks, but the implementation could be more defensive:

subscribe({ workflowId, transactionId, subscriber, subscriberId }: SubscribeOptions) {
  // Subscribe to distributed notifications when first subscriber is added
  if (\!WorkflowOrchestratorService.subscribers.has(workflowId)) {
    this.storage_.notificationSubscriber?.subscribe(
      workflowId,
      this.handleDistributedNotification.bind(this)
    )
  }

Issue: The distributed notification subscriber is never unsubscribed even when the last local subscriber is removed from a workflow (though the unsubscribe at line 782 does handle this). However, there's no automatic cleanup for workflow-scoped subscribers.

Recommendation: Consider adding an automatic cleanup mechanism or making the memory leak warning more prominent in the API documentation.

2. Redis Connection Error Handling (redis-workflows-storage.ts:373-429)

Location: packages/modules/providers/workflows-redis/src/services/redis-workflows-storage.ts:373-429

private async ensureRedisConnection(): Promise<void> {
  // ... connection checking logic
  if (reconnectTasks.length > 0) {
    await promiseAll(reconnectTasks)
  }
}

Issue: This method is called in onApplicationStart but if Redis is down at startup, the worker will fail to start. There's no retry mechanism or graceful degradation.

Recommendation: Consider adding:

  • Retry logic with exponential backoff for initial connection
  • Health check endpoint to monitor Redis connection status
  • Circuit breaker pattern for Redis operations

3. Lock Timeout Configuration (redis-workflows-storage.ts:879-894)

Location: packages/modules/providers/workflows-redis/src/services/redis-workflows-storage.ts:879-894

async #acquireLock(key: string, ttlSeconds: number = 5): Promise<string | null>

Issue: The hardcoded 5-second lock TTL might be insufficient for long-running DB operations, especially under heavy load.

Recommendation: Make lock TTL configurable via provider options:

lockTtl?: number // in seconds, default: 5

High Priority Issues

4. Race Condition in Timer Cleanup (local-workflows-storage.ts:180-198)

Location: packages/modules/workflows/src/providers/local-workflows-storage.ts:180-198

private createManagedTimer(callback: () => void | Promise<void>, delay: number): NodeJS.Timeout {
  const timer = setTimeout(async () => {
    this.pendingTimers.delete(timer)
    // ...
  }, delay)
  this.pendingTimers.add(timer)
  return timer
}

Issue: There's a tiny race window where the timer could fire before being added to pendingTimers if delay is 0.

Recommendation: Add the timer to the set before creating the setTimeout:

const timer = {} as NodeJS.Timeout
this.pendingTimers.add(timer as any)
const actualTimer = setTimeout(async () => { /* ... */ }, delay)
// Update reference

Or simply document that delay should never be 0.

5. Missing Validation in Provider Loader (providers.ts:71-77)

Location: packages/modules/workflows/src/loaders/providers.ts:71-77

if (options.providers?.length > 1) {
  throw new Error(`Workflows module: Multiple providers configured: ${options.providers.map((p) => p.id).join(", ")}`)
}

Issue: The error message is good, but the code doesn't validate that providers actually implement the required interfaces.

Recommendation: Add interface validation before loading.

Medium Priority Issues

6. Inconsistent Error Logging (workflow-orchestrator.ts:812-815)

Location: packages/modules/workflows/src/services/workflow-orchestrator.ts:812-815

Some errors are logged but not thrown, while others are thrown. This inconsistency could make debugging difficult.

Recommendation: Establish consistent error handling patterns:

  • Critical errors: log + throw
  • Recoverable errors: log + return default value
  • Expected errors: handle silently or log at debug level

7. Potential Data Loss on Parallel DB/Redis Operations (redis-workflows-storage.ts:652-661)

Location: packages/modules/providers/workflows-redis/src/services/redis-workflows-storage.ts:652-661

if (hasFinished && \!retentionTime) {
  if (\!data.flow.metadata?.parentStepIdempotencyKey) {
    await promiseAll([this.deleteFromDb(data), execPipeline()])
  }
}

Issue: If deleteFromDb succeeds but execPipeline (Redis) fails, the transaction state is inconsistent.

Recommendation: Consider using a saga pattern or reversing the order (Redis first, then DB).

8. Missing Type Safety in Notification Handler (redis-workflows-storage.ts:174-189)

Location: packages/modules/providers/workflows-redis/src/services/redis-workflows-storage.ts:174-189

const data = JSON.parse(message)
handler(workflowId, data)

Issue: No validation that data matches the expected DistributedNotifyOptions type.

Recommendation: Add runtime type validation using a schema validator (zod, joi, etc.).

Low Priority / Suggestions

9. Performance Optimization Opportunity

Location: packages/modules/workflows/src/utils/workflow-storage-utils.ts:40-99

The shouldSaveToDb function iterates through all steps. For workflows with many steps, this could be optimized with early returns or caching.

10. Documentation Improvements

  • The distributed notification system is complex but lacks inline documentation explaining the flow
  • Provider configuration examples in the PR description are good, but should also be in code comments
  • The instance deduplication mechanism (using instanceId) could use more explanation

11. Type Safety Improvements

  • Several any types in service constructors could be more specific
  • The @ts-expect-error directives in workflows-module.ts (lines 64, 122, 134) should have explanatory comments

🔒 Security Review

Passed ✅

  • SQL Injection: Uses parameterized queries via ORM
  • Command Injection: No shell command execution
  • Distributed Lock: Proper implementation with owner verification
  • Redis Pub/Sub: Channels are properly scoped with orchestrator: prefix

Notes

  • Redis Security: Consider adding TLS/SSL configuration options for production Redis connections
  • Lock TTL: The 5-second default might need tuning for production workloads

🚀 Performance Considerations

Optimizations ✅

  • Parallel operations: Good use of promiseAll for concurrent DB/Redis operations
  • Selective DB saves: shouldSaveToDb prevents unnecessary DB writes
  • Timer cleanup: Proper use of unref() on scheduled job timers (line 538, 591 in local-workflows-storage.ts)

Concerns ⚠️

  • Redis connection pooling: Each operation creates new promises; consider connection pooling optimization for high-throughput scenarios
  • Large workflow state: No size limits on transaction checkpoints stored in Redis/DB

📝 Code Style & Conventions

Compliance ✅

  • Naming conventions: Follows Medusa patterns (camelCase for variables, PascalCase for classes)
  • Decorators: Proper use of @InjectManager, @InjectSharedContext, etc.
  • Error handling: Consistent MedusaError usage
  • No semicolons: Follows Prettier config

Minor Issues

  • Some files have long lines (>120 chars) - consider breaking them up
  • The #isWorkerMode private field could use a comment explaining its purpose

📊 Test Coverage Assessment

Coverage ✅

  • Integration tests: Excellent coverage with 4000+ lines
  • Race conditions: Well tested
  • Edge cases: Timeout scenarios, retries, scheduled jobs all covered

Gaps

  • Unit tests: Limited unit test coverage for utility functions
  • Failure scenarios: More tests for Redis connection failures, DB errors
  • Performance tests: No load testing for distributed scenarios

🎯 Summary

This is a well-architected and well-executed refactoring that significantly improves the workflow engine's maintainability and scalability. The provider pattern is cleanly implemented, backward compatibility is maintained, and the test coverage is strong.

Required Actions (Before Merge)

  1. ✅ Review and address the memory leak warning visibility
  2. ⚠️ Add retry logic for Redis connection failures
  3. ⚠️ Make Redis lock TTL configurable

Recommended Actions (Can be follow-up PRs)

  1. Add runtime type validation for distributed notifications
  2. Implement health checks for Redis connections
  3. Add unit tests for utility functions
  4. Document the distributed notification flow
  5. Add performance benchmarks

Approval Recommendation

✅ Approve with minor changes - The critical issues are manageable and don't block the architectural improvement this PR provides. The code quality is high, and the approach is sound.


Great work on this significant refactoring! The unified provider architecture will make it much easier to maintain and extend the workflow engine going forward. 🎉

@adrien2p adrien2p marked this pull request as ready for review January 21, 2026 13:01
@adrien2p adrien2p requested review from a team as code owners January 21, 2026 13:01
cursor[bot]

This comment was marked as outdated.

Copy link
Contributor

@carlos-r-l-rodrigues carlos-r-l-rodrigues left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Amazing. 🔥

cursor[bot]

This comment was marked as outdated.

Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 3 potential issues.

Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants