-
Notifications
You must be signed in to change notification settings - Fork 523
Description
Bug Report
What version of Kubernetes are you using?
Bug is reproduced in Kubernetes v1.33.1 with Flannel or Cilium CNI and NOT reproduced with iptables-based Calico CNI.
Bug is NOT reproduced in Kubernetes v1.32.5.
What version of TiDB Operator are you using?
v1.6.1
What storage classes exist in the Kubernetes cluster and what are used for PD/TiKV pods?
storageClass: local-path, https://github.com/rancher/local-path-provisioner
What's the status of the TiDB cluster pods?
all green
What did you do?
I have one-node k8s installed using kubeadm. TiDB (v8.5.1, v7.5.6) created using the following definition:
cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: Namespace
metadata:
name: tidb-test
---
apiVersion: pingcap.com/v1alpha1
kind: TidbCluster
metadata:
name: basic
namespace: tidb-test
spec:
timezone: UTC
pvReclaimPolicy: Retain
enableDynamicConfiguration: true
configUpdateStrategy: RollingUpdate
discovery: {}
helper:
image: flwang/alpine-nslookup:latest
pd:
baseImage: pingcap/pd
maxFailoverCount: 0
replicas: 3
storageClassName: local-path
requests:
storage: "10Gi"
config: {}
# startUpScriptVersion: v1
tikv:
baseImage: pingcap/tikv
maxFailoverCount: 0
evictLeaderTimeout: 1m
replicas: 3
storageClassName: local-path
requests:
storage: "50Gi"
config: {}
tidb:
baseImage: pingcap/tidb
maxFailoverCount: 0
replicas: 3
service:
type: ClusterIP
config: {}
storageClassName: local-path
EOF
All pods are green, services, endpoints created, TiDB functional, but every 30 sec I see a new warning in k8s events:
Namespace: default
Message: IPAddress: 10.<X>.<Y>.<Z> for Service tidb-test/basic-discovery has a wrong reference; cleaning up
Reason: IPAddressWrongReference
K8s events:
kubectl get events --sort-by=.metadata.creationTimestamp --no-headers | tail -r
27s Warning IPAddressWrongReference ipaddress/10.111.6.74 IPAddress: 10.111.6.74 for Service tidb-test/basic-discovery has a wrong reference; cleaning up
57s Warning IPAddressWrongReference ipaddress/10.101.123.74 IPAddress: 10.101.123.74 for Service tidb-test/basic-discovery has a wrong reference; cleaning up
87s Warning IPAddressWrongReference ipaddress/10.101.111.202 IPAddress: 10.101.111.202 for Service tidb-test/basic-discovery has a wrong reference; cleaning up
117s Warning IPAddressWrongReference ipaddress/10.97.166.25 IPAddress: 10.97.166.25 for Service tidb-test/basic-discovery has a wrong reference; cleaning up
2m27s Warning IPAddressWrongReference ipaddress/10.103.163.71 IPAddress: 10.103.163.71 for Service tidb-test/basic-discovery has a wrong reference; cleaning up
...
Based on k8s api-server audit logs, it seems that tidb-operator for some reason is trying to create new discovery service again every reconsilation cycle:
{"kind":"Event","apiVersion":"audit.k8s.io/v1","level":"Metadata","auditID":"<auditID>","stage":"ResponseComplete","requestURI":"/api/v1/namespaces/tidb-test/services","verb":"create","user":{"username":"system:serviceaccount:tidb-admin:tidb-controller-manager","uid":"<uid>","groups":["system:serviceaccounts","system:serviceaccounts:tidb-admin","system:authenticated"],"extra":{"authentication.kubernetes.io/credential-id":["JTI=<JTI>"],"authentication.kubernetes.io/node-name":["k8s-main-1"],"authentication.kubernetes.io/node-uid":["<node-uid>"],"authentication.kubernetes.io/pod-name":["tidb-controller-manager-NN"],"authentication.kubernetes.io/pod-uid":["<pod-uid>"]}},"sourceIPs":["<sourceIPs>"],"userAgent":"tidb-controller-manager/v0.0.0 (linux/amd64) kubernetes/$Format","objectRef":{"resource":"services","namespace":"tidb-test","name":"basic-discovery","apiVersion":"v1"},"responseStatus":{"metadata":{},"status":"Failure","message":"services \"basic-discovery\" already exists","reason":"AlreadyExists","details":{"name":"basic-discovery","kind":"services"},"code":409},"requestReceivedTimestamp":"<requestReceivedTimestamp>","stageTimestamp":"<stageTimestamp>","annotations":{"authorization.k8s.io/decision":"allow","authorization.k8s.io/reason":"RBAC: allowed by ClusterRoleBinding \"tidb-operator:tidb-controller-manager\" of ClusterRole \"tidb-operator:tidb-controller-manager\" to ServiceAccount \"tidb-controller-manager/tidb-admin\""}}
Behavior of tidb operator is absolutely not clear from the tidb operator logs even with --v="5". I suggest to improve logs and log all mutations like a service creation.
I also see warnings in tidb operator logs (every 10 mins): v1 Endpoints is deprecated in v1.33+; use discovery.k8s.io/v1 EndpointSlice
What did you expect to see?
No warnings in k8s events, no attempts from tidb-operator to create second time properly working descovery service
What did you see instead?
tidb-operator is trying to create descovery service a second time every 30 sec.