-
Notifications
You must be signed in to change notification settings - Fork 76
Open
Labels
Good to HaveorchestratorThis change is relevant to orchestratorThis change is relevant to orchestrator
Description
Implement a Redis-like distributed locking system as infrastructure for the Madara Orchestrator that provides atomic operations, TTL-based cleanup, and high-performance coordination primitives for job processing and worker coordination.
This Provide distributed coordination infrastructure for orchestrator workers with lock operations and automatic cleanup.
🎯 Problem Statement
Currently, the Madara Orchestrator lacks a centralized distributed locking mechanism, leading to:
- Ad-hoc concurrency control implementations in job processing
- Potential race conditions between multiple orchestrator workers, have been solved as a hacky solution
- Risk of orphaned jobs when workers crash (current
LockedForProcessingvulnerability) - No standardized way to coordinate between orchestrator instances
- Manual intervention required for recovery scenarios
💡 Proposed Solution
Build a cache-based distributed locking system specifically for orchestrator coordination with the following characteristics:
Core Features
- Redis-like Operations: SETNX, EXPIRE, GET, DEL for atomic operations
- TTL-based Cleanup: Automatic lock expiration to prevent orphaned locks
Key Components
- CacheService Trait: Generic interface for cache operations
- Backend Implementation: MongoDB-based with connection pooling considerations
- Distributed Locking: High-level functions for job lock acquisition/release
- Orchestrator Integration: Seamless integration with existing orchestrator config
🏗️ Technical Requirements
Challenges & Mitigation
MongoDB Connection Limitations
- Problem: MongoDB default max connections (~1000) can be exceeded in high-scale environments
- We Might need to consider to use other solution such as redis when we scale as service
Metadata
Metadata
Assignees
Labels
Good to HaveorchestratorThis change is relevant to orchestratorThis change is relevant to orchestrator