Skip to content

Conversation

@pooknull
Copy link
Contributor

@pooknull pooknull commented Nov 7, 2025

K8SPS-601 Powered by Pull Request Badge

https://perconadev.atlassian.net/browse/K8SPS-601

CHANGE DESCRIPTION

Problem:
Short explanation of the problem.

Cause:
Short explanation of the root cause of the issue if applicable.

Solution:
Short explanation of the solution we are providing with this PR.

CHECKLIST

Jira

  • Is the Jira ticket created and referenced properly?
  • Does the Jira ticket have the proper statuses for documentation (Needs Doc) and QA (Needs QA)?
  • Does the Jira ticket link to the proper milestone (Fix Version field)?

Tests

  • Is an E2E test/test case added for the new feature/change?
  • Are unit tests added where appropriate?

Config/Logging/Testability

  • Are all needed new/changed options added to default YAML files?
  • Are all needed new/changed options added to the Helm Chart?
  • Did we add proper logging messages for operator actions?
  • Did we ensure compatibility with the previous version or cluster upgrade process?
  • Does the change support oldest and newest supported PS version?
  • Does the change support oldest and newest supported Kubernetes version?

Copilot AI review requested due to automatic review settings November 7, 2025 13:38
@pull-request-size pull-request-size bot added the size/XS 0-9 lines label Nov 7, 2025
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR changes the event type from EventTypeWarning to EventTypeNormal for two specific events in the status reconciliation logic:

  • When async replication is not ready during initialization
  • When the cluster state changes

Key Changes

  • Event type changed from Warning to Normal for "AsyncReplicationNotReady" events
  • Event type changed from Warning to Normal for "ClusterStateChanged" events

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.


log.Info(fmt.Sprintf("Async replication not ready: %s", msg))
r.Recorder.Event(cr, corev1.EventTypeWarning, "AsyncReplicationNotReady", msg)
r.Recorder.Event(cr, corev1.EventTypeNormal, "AsyncReplicationNotReady", msg)
Copy link

Copilot AI Nov 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing this event from Warning to Normal may be inappropriate. This event is triggered when async replication is not ready, causing the state to be set to StateInitializing (line 105). While the code continues with normal flow, this represents a condition where the system is not yet operational. In Kubernetes, Warning events typically indicate issues or non-ready states that administrators should be aware of. Consider whether this transient failure condition should remain as EventTypeWarning to maintain visibility into operational issues.

Suggested change
r.Recorder.Event(cr, corev1.EventTypeNormal, "AsyncReplicationNotReady", msg)
r.Recorder.Event(cr, corev1.EventTypeWarning, "AsyncReplicationNotReady", msg)

Copilot uses AI. Check for mistakes.
if status.State != initialState {
log.Info("Cluster state changed", "previous", initialState, "current", status.State)
r.Recorder.Event(cr, corev1.EventTypeWarning, "ClusterStateChanged", fmt.Sprintf("%s -> %s", initialState, status.State))
r.Recorder.Event(cr, corev1.EventTypeNormal, "ClusterStateChanged", fmt.Sprintf("%s -> %s", initialState, status.State))
Copy link

Copilot AI Nov 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing this event from Warning to Normal is problematic because it fires for all state transitions, including transitions to error states. For example, if the state changes from Ready to Error (e.g., line 194 sets state to StateError on full cluster crash), this would generate a Normal event for what is actually a critical problem. The event type should either remain as Warning, or the code should conditionally use Warning for transitions to error/initializing states and Normal only for transitions to ready states.

Suggested change
r.Recorder.Event(cr, corev1.EventTypeNormal, "ClusterStateChanged", fmt.Sprintf("%s -> %s", initialState, status.State))
eventType := corev1.EventTypeNormal
if status.State == apiv1.StateError || status.State == apiv1.StateInitializing {
eventType = corev1.EventTypeWarning
}
r.Recorder.Event(cr, eventType, "ClusterStateChanged", fmt.Sprintf("%s -> %s", initialState, status.State))

Copilot uses AI. Check for mistakes.
@JNKPercona
Copy link
Collaborator

Test Name Result Time
async-ignore-annotations-8-4 passed 00:06:37
async-global-metadata-8-4 passed 00:14:26
async-upgrade-8-0 passed 00:12:29
async-upgrade-8-4 passed 00:12:42
auto-config-8-4 passed 00:24:18
config-8-4 passed 00:17:06
config-router-8-0 passed 00:06:59
config-router-8-4 passed 00:07:27
demand-backup-minio-8-0 passed 00:20:11
demand-backup-minio-8-4 passed 00:20:13
demand-backup-cloud-8-4 passed 00:21:09
demand-backup-retry-8-4 passed 00:17:33
async-data-at-rest-encryption-8-0 passed 00:13:26
async-data-at-rest-encryption-8-4 passed 00:13:10
gr-global-metadata-8-4 passed 00:15:31
gr-data-at-rest-encryption-8-0 passed 00:15:19
gr-data-at-rest-encryption-8-4 passed 00:14:49
gr-demand-backup-minio-8-4 passed 00:12:54
gr-demand-backup-cloud-8-4 passed 00:21:49
gr-demand-backup-haproxy-8-4 passed 00:10:59
gr-finalizer-8-4 passed 00:05:33
gr-haproxy-8-0 passed 00:04:12
gr-haproxy-8-4 passed 00:04:13
gr-ignore-annotations-8-4 passed 00:05:07
gr-init-deploy-8-0 passed 00:09:43
gr-init-deploy-8-4 passed 00:09:08
gr-one-pod-8-4 passed 00:05:33
gr-recreate-8-4 passed 00:17:59
gr-scaling-8-4 passed 00:07:42
gr-scheduled-backup-8-4 passed 00:15:15
gr-security-context-8-4 passed 00:09:57
gr-self-healing-8-4 passed 00:23:31
gr-tls-cert-manager-8-4 passed 00:09:25
gr-users-8-4 passed 00:05:42
gr-upgrade-8-0 passed 00:08:12
gr-upgrade-8-4 passed 00:09:37
haproxy-8-0 passed 00:08:04
haproxy-8-4 passed 00:08:33
init-deploy-8-0 passed 00:06:49
init-deploy-8-4 passed 00:05:31
limits-8-4 passed 00:06:35
monitoring-8-4 passed 00:16:00
one-pod-8-0 passed 00:05:43
one-pod-8-4 passed 00:05:40
operator-self-healing-8-4 passed 00:11:53
pvc-resize-8-4 passed 00:08:06
recreate-8-4 passed 00:13:08
scaling-8-4 passed 00:10:38
scheduled-backup-8-0 passed 00:17:06
scheduled-backup-8-4 passed 00:16:16
service-per-pod-8-4 passed 00:06:34
sidecars-8-4 passed 00:04:36
smart-update-8-4 passed 00:09:29
storage-8-4 passed 00:03:56
telemetry-8-4 passed 00:06:05
tls-cert-manager-8-4 passed 00:11:41
users-8-0 passed 00:07:36
users-8-4 passed 00:07:31
version-service-8-4 passed 00:20:50
Summary Value
Tests Run 59/59
Job Duration 01:57:58
Total Test Time 11:08:49

commit: 268a9e7
image: perconalab/percona-server-mysql-operator:PR-1155-268a9e7e

Copy link
Contributor

@egegunes egegunes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm having a hard time understanding the motivation behind these changes.

@pooknull
Copy link
Contributor Author

I'm having a hard time understanding the motivation behind these changes.

We use Warning event types when the operator encounters a problem, for example for a ReconcileError event.

According to the Kubernetes code, Warning should be used when something might go wrong: https://github.com/kubernetes/api/blob/f1737241c064fd7085204754c40d4c0308826b81/core/v1/types.go#L7480-L7485

In my opinion, AsyncReplicationNotReady and ClusterStateChanged are informational events.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size/XS 0-9 lines

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants