Skip to content

tidb-operator v1.6.1 is not fully compatible with k8s v1.33.1, every 30 sec tidb-operator is trying to create second discovery service #6234

@olegsmetanin

Description

@olegsmetanin

Bug Report

What version of Kubernetes are you using?
Bug is reproduced in Kubernetes v1.33.1 with Flannel or Cilium CNI and NOT reproduced with iptables-based Calico CNI.
Bug is NOT reproduced in Kubernetes v1.32.5.

What version of TiDB Operator are you using?
v1.6.1

What storage classes exist in the Kubernetes cluster and what are used for PD/TiKV pods?
storageClass: local-path, https://github.com/rancher/local-path-provisioner

What's the status of the TiDB cluster pods?
all green

What did you do?
I have one-node k8s installed using kubeadm. TiDB (v8.5.1, v7.5.6) created using the following definition:

cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: Namespace
metadata:
  name: tidb-test
---
apiVersion: pingcap.com/v1alpha1
kind: TidbCluster
metadata:
  name: basic
  namespace: tidb-test
spec:
  timezone: UTC
  pvReclaimPolicy: Retain
  enableDynamicConfiguration: true
  configUpdateStrategy: RollingUpdate
  discovery: {}
  helper:
   image: flwang/alpine-nslookup:latest
  pd:
    baseImage: pingcap/pd
    maxFailoverCount: 0
    replicas: 3
    storageClassName: local-path
    requests:
      storage: "10Gi"
    config: {}
    # startUpScriptVersion: v1
  tikv:
    baseImage: pingcap/tikv
    maxFailoverCount: 0
    evictLeaderTimeout: 1m
    replicas: 3
    storageClassName: local-path
    requests:
      storage: "50Gi"
    config: {}

  tidb:
    baseImage: pingcap/tidb
    maxFailoverCount: 0
    replicas: 3
    service:
      type: ClusterIP
    config: {}
    storageClassName: local-path
EOF

All pods are green, services, endpoints created, TiDB functional, but every 30 sec I see a new warning in k8s events:

Namespace: default
Message: IPAddress: 10.<X>.<Y>.<Z> for Service tidb-test/basic-discovery has a wrong reference; cleaning up
Reason: IPAddressWrongReference

K8s events:

kubectl get events --sort-by=.metadata.creationTimestamp --no-headers | tail -r
27s     Warning   IPAddressWrongReference   ipaddress/10.111.6.74      IPAddress: 10.111.6.74 for Service tidb-test/basic-discovery has a wrong reference; cleaning up
57s     Warning   IPAddressWrongReference   ipaddress/10.101.123.74    IPAddress: 10.101.123.74 for Service tidb-test/basic-discovery has a wrong reference; cleaning up
87s     Warning   IPAddressWrongReference   ipaddress/10.101.111.202   IPAddress: 10.101.111.202 for Service tidb-test/basic-discovery has a wrong reference; cleaning up
117s    Warning   IPAddressWrongReference   ipaddress/10.97.166.25     IPAddress: 10.97.166.25 for Service tidb-test/basic-discovery has a wrong reference; cleaning up
2m27s   Warning   IPAddressWrongReference   ipaddress/10.103.163.71    IPAddress: 10.103.163.71 for Service tidb-test/basic-discovery has a wrong reference; cleaning up
...

Based on k8s api-server audit logs, it seems that tidb-operator for some reason is trying to create new discovery service again every reconsilation cycle:

{"kind":"Event","apiVersion":"audit.k8s.io/v1","level":"Metadata","auditID":"<auditID>","stage":"ResponseComplete","requestURI":"/api/v1/namespaces/tidb-test/services","verb":"create","user":{"username":"system:serviceaccount:tidb-admin:tidb-controller-manager","uid":"<uid>","groups":["system:serviceaccounts","system:serviceaccounts:tidb-admin","system:authenticated"],"extra":{"authentication.kubernetes.io/credential-id":["JTI=<JTI>"],"authentication.kubernetes.io/node-name":["k8s-main-1"],"authentication.kubernetes.io/node-uid":["<node-uid>"],"authentication.kubernetes.io/pod-name":["tidb-controller-manager-NN"],"authentication.kubernetes.io/pod-uid":["<pod-uid>"]}},"sourceIPs":["<sourceIPs>"],"userAgent":"tidb-controller-manager/v0.0.0 (linux/amd64) kubernetes/$Format","objectRef":{"resource":"services","namespace":"tidb-test","name":"basic-discovery","apiVersion":"v1"},"responseStatus":{"metadata":{},"status":"Failure","message":"services \"basic-discovery\" already exists","reason":"AlreadyExists","details":{"name":"basic-discovery","kind":"services"},"code":409},"requestReceivedTimestamp":"<requestReceivedTimestamp>","stageTimestamp":"<stageTimestamp>","annotations":{"authorization.k8s.io/decision":"allow","authorization.k8s.io/reason":"RBAC: allowed by ClusterRoleBinding \"tidb-operator:tidb-controller-manager\" of ClusterRole \"tidb-operator:tidb-controller-manager\" to ServiceAccount \"tidb-controller-manager/tidb-admin\""}}

Behavior of tidb operator is absolutely not clear from the tidb operator logs even with --v="5". I suggest to improve logs and log all mutations like a service creation.

I also see warnings in tidb operator logs (every 10 mins): v1 Endpoints is deprecated in v1.33+; use discovery.k8s.io/v1 EndpointSlice

What did you expect to see?
No warnings in k8s events, no attempts from tidb-operator to create second time properly working descovery service

What did you see instead?
tidb-operator is trying to create descovery service a second time every 30 sec.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions