Skip to content

[refactor] Rework the config apply rollout to work directly with ClusterMachineConfig #1929

@utkuozdemir

Description

@utkuozdemir

The goal of this refactor is to simplify Omni by unifying the config rollout logic. Currently, the MachineSetStatusController adds unnecessary complexity by trying to control the rollout timing, updating ClusterMachineConfigPatches one by one. This approach is brittle and contributes to issues where the system gets stuck on invalid states.

Instead, we should create all config patches immediately and merge them into the ClusterMachineConfig right away. The actual rollout coordination logic should move down to the ClusterMachineConfig controller (or a new controller). This controller will be responsible for deciding when to apply the config based on the machine's state, such as applying immediately for broken nodes or machines not in a cluster, but coordinating safe updates for healthy nodes.

Further context if needed: https://docs.google.com/document/d/1zFXi8Vut8-qmBoWVl9ZM98nVDZE4F3dj9uamgEtKE6E/edit?tab=t.zr0wf2vhcg5

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    Projects

    Status

    To Do

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions