Skip to content

🔧 Infrastructure: Implement Distributed Locking System for Madara Orchestrator #736

@0xvasanth

Description

@0xvasanth

Implement a Redis-like distributed locking system as infrastructure for the Madara Orchestrator that provides atomic operations, TTL-based cleanup, and high-performance coordination primitives for job processing and worker coordination.

This Provide distributed coordination infrastructure for orchestrator workers with lock operations and automatic cleanup.


🎯 Problem Statement

Currently, the Madara Orchestrator lacks a centralized distributed locking mechanism, leading to:

  • Ad-hoc concurrency control implementations in job processing
  • Potential race conditions between multiple orchestrator workers, have been solved as a hacky solution
  • Risk of orphaned jobs when workers crash (current LockedForProcessing vulnerability)
  • No standardized way to coordinate between orchestrator instances
  • Manual intervention required for recovery scenarios

💡 Proposed Solution

Build a cache-based distributed locking system specifically for orchestrator coordination with the following characteristics:

Core Features

  • Redis-like Operations: SETNX, EXPIRE, GET, DEL for atomic operations
  • TTL-based Cleanup: Automatic lock expiration to prevent orphaned locks

Key Components

  1. CacheService Trait: Generic interface for cache operations
  2. Backend Implementation: MongoDB-based with connection pooling considerations
  3. Distributed Locking: High-level functions for job lock acquisition/release
  4. Orchestrator Integration: Seamless integration with existing orchestrator config

🏗️ Technical Requirements

Challenges & Mitigation

MongoDB Connection Limitations

  • Problem: MongoDB default max connections (~1000) can be exceeded in high-scale environments
  • We Might need to consider to use other solution such as redis when we scale as service

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions