Race condition in paused-replicas annotation causes ScaledObject to get stuck in inconsistent state

### Report

When applying the autoscaling.keda.sh/paused-replicas annotation to a ScaledObject, there’s a race condition that can leave the system permanently inconsistent:
- KEDA marks the object as paused at the target replica count.
- The underlying Deployment remains at its previous replica count (not scaled).
- The HPA and scale loop are both absent, so nothing corrects the state.
- Manual intervention is required to recover.

This happens intermittently and is timing-dependent when toggling the annotation on/off.

### Expected Behavior

Deployment scales to 0 replicas whenever autoscaling.keda.sh/paused-replicas: "0" is applied, and subsequent toggles reliably pause/resume without getting stuck.

### Actual Behavior

Sometimes the Deployment stays at its original replica count after re-applying the pause annotation. KEDA reports Paused=True, but there is no HPA and no running scale loop, so reconciliation does not proceed.


### Steps to Reproduce the Problem

1. Create a ScaledObject with:
```yaml
cooldownPeriod: 300
pollingInterval: 15
```
2. Ensure the target Deployment is running with replicas > 0.
3. Apply the pause annotation: `kubectl annotate scaledobject <name> autoscaling.keda.sh/paused-replicas="0" --overwrite` → First pause works; Deployment scales down as expected.
4. Remove the annotation and wait until the Deployment scales back up and is running normally.
5. Re-apply the same annotation: `kubectl annotate scaledobject <name> autoscaling.keda.sh/paused-replicas="0" --overwrite`
6. Observe: from the second pause onward, scaling intermittently fails; the Deployment remains at its prior replica count and does not recover automatically.


### Logs from KEDA operator

### Timeline - Failed Case:
```text
15:30:12.609 - Reconcile #1:
├─ Gets ScaledObject (Paused condition = False)
├─ Enters pause block (scaledToPausedCount = true)
├─ Stops scale loop
├─ Deletes HPA → Triggers Reconcile #2
├─ Sets Paused=True (in memory)
└─ Returns, status write begins (slow)
15:30:12.644 - Reconcile #2 (35ms later):
├─ Gets ScaledObject (Paused condition STILL False!)
├─ Status write from #1 not persisted yet
├─ Enters pause block again (scaledToPausedCount = true)
├─ Tries to stop already-stopped loop
├─ Log: "ScalableObject was not found in controller cache"
├─ Returns early
└─ NO HPA created, NO scale loop started
[Stuck permanently - no more reconciles]
```
### Timeline - Success Case:
```text
15:19:58.114 - Reconcile #1:
├─ Same as above
└─ Status write begins
15:19:58.166 - Reconcile #2 (52ms later):
├─ Gets ScaledObject (Paused condition = True) ✅
├─ Status write completed!
├─ checkIfTargetResourceReachPausedCount() → false
├─ Falls through to normal reconcile
├─ Creates NEW HPA ✅
├─ Starts NEW scale loop ✅
└─ Scale loop scales to 0 ✅
```


details pls refer to the attachment 

[SuccessCasse-Logs-2025-11-02 17_32_48.txt](https://github.com/user-attachments/files/23298515/SuccessCasse-Logs-2025-11-02.17_32_48.txt)
[FailedCase-Logs-2025-11-02 17_30_57.txt](https://github.com/user-attachments/files/23298514/FailedCase-Logs-2025-11-02.17_30_57.txt)

### KEDA Version

2.18.0

### Kubernetes Version

1.31

### Platform

Other

### Scaler Details

prometheus

### Anything else?

The issue is in ([`controllers/keda/scaledobject_controller.go`](https://github.com/kedacore/keda/blob/main/controllers/keda/scaledobject_controller.go)), lines 243-246:

```go
case needsToPause:
    scaledToPausedCount := true  // ← Dangerous default
    if conditions.GetPausedCondition().Status == metav1.ConditionTrue {
        // Only checks deployment state if condition already True
        scaledToPausedCount = r.checkIfTargetResourceReachPausedCount(...)
        if scaledToPausedCount {
            return // Already done
        }
    }
    if scaledToPausedCount {
        // Enters this block in BOTH reconciles during race
        stopScaleLoop()
        deleteHPA()
        conditions.SetPausedCondition(metav1.ConditionTrue, ...)
        return
    }
```

The problem: When `Paused=False` due to the race, `scaledToPausedCount` stays `true` and incorrectly enters the stop block again.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Race condition in paused-replicas annotation causes ScaledObject to get stuck in inconsistent state #7231

Report

Expected Behavior

Actual Behavior

Steps to Reproduce the Problem

Logs from KEDA operator

Timeline - Failed Case:

Timeline - Success Case:

KEDA Version

Kubernetes Version

Platform

Scaler Details

Anything else?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Race condition in paused-replicas annotation causes ScaledObject to get stuck in inconsistent state #7231

Description

Report

Expected Behavior

Actual Behavior

Steps to Reproduce the Problem

Logs from KEDA operator

Timeline - Failed Case:

Timeline - Success Case:

KEDA Version

Kubernetes Version

Platform

Scaler Details

Anything else?

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions