[Bug]: Strimzi Operator reconcile loop is indefinitely stuck

### Bug Description

Hi Strimzi experts,
I'm running a Strimzi-managed Kafka Connect cluster in a single Kubernetes namespace. 
Next to it, I have hundreds of KafkaConnectors in a different namespace that are listening to multiple **MongoDB** instances. One MongoDB instance for each connector.

The Strimzi operator is deployed via Helm chart version 0.45.0 from oci://quay.io/strimzi-helm/strimzi-kafka-operator.

I'm facing an issue where one single stuck KafkaConnector CR with connectivity issues to mongo blocks the entire reconciliation loop of Strimzi Kafka Connect operator.

This blocks the operator from processing other connectors and even the cluster's resources like Kafka Connect pods or brokers. I can't restart resources via annotations, and other connectors aren't registering / deleting in Kafka Connect. Attempts to restart them or use "rollingUpdate: true" don't work.

Here's a log excerpt from the operator for that single Kafkaconnector :

```
2025-09-30 10:27:12 WARN  KafkaConnectAssemblyOperator:584 - Reconciliation #28129(connector-watch) KafkaConnect (data-sources/data-sources-connect-cluster): 
Error reconciling connector connector-xxxxx io.strimzi.operator.cluster.operator.assembly.ConnectRestException: PUT /connectors/connector-xxxxx/config returned 500 (Internal Server Error): Request timed out. 
The worker is currently performing multi-property validation for the connector, which began at 2025-09-30T10:25:42.723Z.
```

The only workaround that unblocked the reconcile loop was deleting the connector with issues, which then freed up the entire cluster. 

To reproduce, I can simply stop the MongoDB server, causing the previous validation timeout error message.

This feels like a **critical blocker in production**, as **one single faulty connector can take down the whole setup** by blocking hundreds of other connectors to reconcile.

With the help of @scholzj on slack we were able to pinpoint the exact problem. 
The reconciliation is running in an infinite loop. That is likely because the error message provided by Kafka Connect is different every time due to it including the timestamp.

Its here : https://github.com/apache/kafka/blob/9d319283c12f79e353bd45c42c637c9517ac546e/connect/runtime/src/main/java/org/apache/kafka/connect/util/Stage.java#L64-L76

The DEBUG logs of the operator confirm this (LOOK at the timestamp at the end of each line) : 
```
2025-10-09 14:01:47 DEBUG StatusDiff:41 - Ignoring Status diff {"op":"replace","path":"/conditions/0/lastTransitionTime","value":"2025-10-09T14:01:47.930202242Z"}
2025-10-09 14:01:47 DEBUG StatusDiff:46 - Status differs: {"op":"replace","path":"/conditions/0/message","value":"PUT /connectors/connector-xxxxx-mongodb-event/config returned 500 (Internal Server Error): Request timed out. 
The worker is currently performing multi-property validation for the connector, which began at 2025-10-09T14:00:18Z."}
2025-10-09 14:01:47 DEBUG StatusDiff:47 - Current Status path /conditions/0/message has value "PUT /connectors/connector-xxxxx-mongodb-event/config returned 500 (Internal Server Error): Request timed out. 
The worker is currently performing multi-property validation for the connector, which began at 2025-10-09T13:58:47.839Z."
2025-10-09 14:01:47 DEBUG StatusDiff:48 - Desired Status path /conditions/0/message has value "PUT /connectors/connector-xxxxx-mongodb-event/config returned 500 (Internal Server Error): Request timed out. 
The worker is currently performing multi-property validation for the connector, which began at 2025-10-09T14:00:18Z."
2025-10-09 14:01:47 DEBUG CustomResource:195 - Calling CustomResource#setKind doesn't do anything because the Kind is computed and shouldn't be changed
2025-10-09 14:01:47 INFO  CrdOperator:123 - Reconciliation #126(connector-watch) KafkaConnect(data-sources/data-sources-connect-cluster): Status of KafkaConnector connector-xxxxx-mongodb-event in namespace data-sources has been updated
2025-10-09 14:01:47 DEBUG AbstractConnectOperator:1141 - Reconciliation #126(connector-watch) KafkaConnect(data-sources/data-sources-connect-cluster): Completed status update
2025-10-09 14:01:47 INFO  KafkaConnectAssemblyOperator:563 - Reconciliation #126(connector-watch) KafkaConnect(data-sources/data-sources-connect-cluster): reconciled
2025-10-09 14:01:47 DEBUG AbstractOperator:467 - Reconciliation #126(connector-watch) KafkaConnect(data-sources/data-sources-connect-cluster): Lock lock::data-sources::KafkaConnect::data-sources-connect-cluster released
2025-10-09 14:01:48 DEBUG CustomResource:184 - Calling CustomResource#setApiVersion doesn't do anything because the API version is computed and shouldn't be changed
2025-10-09 14:01:48 DEBUG CustomResource:195 - Calling CustomResource#setKind doesn't do anything because the Kind is computed and shouldn't be changed
2025-10-09 14:01:48 DEBUG CustomResource:184 - Calling CustomResource#setApiVersion doesn't do anything because the API version is computed and shouldn't be changed
2025-10-09 14:01:48 DEBUG CustomResource:195 - Calling CustomResource#setKind doesn't do anything because the Kind is computed and shouldn't be changed
2025-10-09 14:01:48 INFO  KafkaConnectAssemblyOperator:555 - Reconciliation #130(connector-watch) KafkaConnect(data-sources/data-sources-connect-cluster): KafkaConnector connector-xxxxx-mongodb-event in namespace data-sources was MODIFIED
2025-10-09 14:01:48 DEBUG AbstractOperator:395 - Reconciliation #130(connector-watch) KafkaConnect(data-sources/data-sources-connect-cluster): Try to acquire lock lock::data-sources::KafkaConnect::data-sources-connect-cluster
2025-10-09 14:01:48 DEBUG AbstractOperator:398 - Reconciliation #130(connector-watch) KafkaConnect(data-sources/data-sources-connect-cluster): Lock lock::data-sources::KafkaConnect::data-sources-connect-cluster acquired
2025-10-09 14:01:48 INFO  KafkaConnectAssemblyOperator:487 - Reconciliation #130(connector-watch) KafkaConnect(data-sources/data-sources-connect-cluster): creating/updating connector: connector-xxxxx-mongodb-event
```

He explained it like this : 

1. One of the reconciliations runs into this error due to a Connect / connector issue (me killing mongodb on purpose) and updates the status of the connector CR with the error message
2. The update of the error message in the connector status means a modification to the resource which triggers immediately another reconciliation
3. The reconciliation waits 90 seconds and runs into the same error again. This is where it should normally stop, because the error is already in the status. But because of the timestamp in the error, it is treated as a new error and the Connector status is updated again
4. New reconciliation is immediately triggered by this update 




### Steps to reproduce

1. Have a KafkaConnector CR that should listen to a MongoDB instance
2. Stop the MongoDB instance
3. Watch the Operator getting stuck in a infinite loop of reconciliation

### Expected behavior

The operator should be resilient and skip the failed connector.
It should continue to process other resources.
I understand the fix could be on the Kafka Connect side (by removing the timestamp from the message). But, I dont know the possible effects of these changes.

### Strimzi version

0.45.0

### Kubernetes version

v1.31.10

### Installation method

Helm CHART

### Infrastructure

OnPremise

### Configuration files and logs

_No response_

### Additional context

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bug]: Strimzi Operator reconcile loop is indefinitely stuck #12004

Bug Description

Steps to reproduce

Expected behavior

Strimzi version

Kubernetes version

Installation method

Infrastructure

Configuration files and logs

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Bug]: Strimzi Operator reconcile loop is indefinitely stuck #12004

Description

Bug Description

Steps to reproduce

Expected behavior

Strimzi version

Kubernetes version

Installation method

Infrastructure

Configuration files and logs

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions