Skip to content

[BUG] milvus topology cluster-with-dep stop start data loss #9767

@JashBook

Description

@JashBook

Describe the bug
A clear and concise description of what the bug is.

kbcli version    
Kubernetes: v1.30.4-vke.4
KubeBlocks: 0.9.6-beta.1
kbcli: 0.9.5

To Reproduce
Steps to reproduce the behavior:

  1. create cluster
apiVersion: v1
kind: Secret
type: Opaque
metadata:
  name: milvus-jrwvla-s3-credential
  namespace: default
stringData:
  accessKey: "kbclitest"
  bucketnames: "kb-milvus-jrwvla"
  endpoint: "http://kbcli-test-minio.kb-system.svc.cluster.local:9000"
  host: "kbcli-test-minio.kb-system.svc.cluster.local"
  port: "9000"
  region: ""
  rulerBucketnames: "kb-milvus-jrwvla"
  secretKey: "kbclitest"
  storageType: "s3"
---
apiVersion: apps.kubeblocks.io/v1alpha1
kind: Cluster
metadata:
  name: milvus-jrwvla
  namespace: default
spec:
  clusterDefinitionRef: milvus
  topology: cluster-with-dep
  terminationPolicy: Halt
  componentSpecs:
    - name: proxy
      serviceVersion: 2.5.13
      replicas: 1
      disableExporter: true
      resources:
        limits:
          cpu: 500m
          memory: 0.5Gi
        requests:
          cpu: 500m
          memory: 0.5Gi
      env:
        - name: MINIO_HOST
          valueFrom:
            secretKeyRef:
              key: host
              name: milvus-jrwvla-s3-credential
        - name: MINIO_PORT
          valueFrom:
            secretKeyRef:
              key: port
              name: milvus-jrwvla-s3-credential
        - name: MINIO_ACCESS_KEY
          valueFrom:
            secretKeyRef:
              key: accessKey
              name: milvus-jrwvla-s3-credential
        - name: MINIO_SECRET_KEY
          valueFrom:
            secretKeyRef:
              key: secretKey
              name: milvus-jrwvla-s3-credential
        - name: MINIO_BUCKETNAME
          valueFrom:
            secretKeyRef:
              key: bucketnames
              name: milvus-jrwvla-s3-credential
    - name: datanode
      serviceVersion: 2.5.13
      replicas: 1
      disableExporter: true
      resources:
        limits:
          cpu: 500m
          memory: 0.5Gi
        requests:
          cpu: 500m
          memory: 0.5Gi
      env:
        - name: MINIO_HOST
          valueFrom:
            secretKeyRef:
              key: host
              name: milvus-jrwvla-s3-credential
        - name: MINIO_PORT
          valueFrom:
            secretKeyRef:
              key: port
              name: milvus-jrwvla-s3-credential
        - name: MINIO_ACCESS_KEY
          valueFrom:
            secretKeyRef:
              key: accessKey
              name: milvus-jrwvla-s3-credential
        - name: MINIO_SECRET_KEY
          valueFrom:
            secretKeyRef:
              key: secretKey
              name: milvus-jrwvla-s3-credential
        - name: MINIO_BUCKETNAME
          valueFrom:
            secretKeyRef:
              key: bucketnames
              name: milvus-jrwvla-s3-credential
    - name: indexnode
      serviceVersion: 2.5.13
      replicas: 1
      disableExporter: true
      resources:
        limits:
          cpu: 500m
          memory: 0.5Gi
        requests:
          cpu: 500m
          memory: 0.5Gi
      env:
        - name: MINIO_HOST
          valueFrom:
            secretKeyRef:
              key: host
              name: milvus-jrwvla-s3-credential
        - name: MINIO_PORT
          valueFrom:
            secretKeyRef:
              key: port
              name: milvus-jrwvla-s3-credential
        - name: MINIO_ACCESS_KEY
          valueFrom:
            secretKeyRef:
              key: accessKey
              name: milvus-jrwvla-s3-credential
        - name: MINIO_SECRET_KEY
          valueFrom:
            secretKeyRef:
              key: secretKey
              name: milvus-jrwvla-s3-credential
        - name: MINIO_BUCKETNAME
          valueFrom:
            secretKeyRef:
              key: bucketnames
              name: milvus-jrwvla-s3-credential
    - name: querynode
      serviceVersion: 2.5.13
      replicas: 1
      disableExporter: true
      resources:
        limits:
          cpu: 500m
          memory: 0.5Gi
        requests:
          cpu: 500m
          memory: 0.5Gi
      env:
        - name: MINIO_HOST
          valueFrom:
            secretKeyRef:
              key: host
              name: milvus-jrwvla-s3-credential
        - name: MINIO_PORT
          valueFrom:
            secretKeyRef:
              key: port
              name: milvus-jrwvla-s3-credential
        - name: MINIO_ACCESS_KEY
          valueFrom:
            secretKeyRef:
              key: accessKey
              name: milvus-jrwvla-s3-credential
        - name: MINIO_SECRET_KEY
          valueFrom:
            secretKeyRef:
              key: secretKey
              name: milvus-jrwvla-s3-credential
        - name: MINIO_BUCKETNAME
          valueFrom:
            secretKeyRef:
              key: bucketnames
              name: milvus-jrwvla-s3-credential
    - name: etcd
      serviceVersion: 3.6.1
      replicas: 1
      disableExporter: true
      resources:
        limits:
          cpu: 500m
          memory: 0.5Gi
        requests:
          cpu: 500m
          memory: 0.5Gi
      volumeClaimTemplates:
        - name: data
          spec:
            storageClassName: 
            accessModes:
              - ReadWriteOnce
            resources:
              requests:
                storage: 20Gi
    - name: kafka
      serviceVersion: 3.3.2
      replicas: 1
      disableExporter: true
      resources:
        limits:
          cpu: 500m
          memory: 0.5Gi
        requests:
          cpu: 500m
          memory: 0.5Gi
      volumeClaimTemplates:
        - name: data
          spec:
            storageClassName: 
            accessModes:
              - ReadWriteOnce
            resources:
              requests:
                storage: 20Gi
    - name: mixcoord
      serviceVersion: 2.5.13
      replicas: 1
      disableExporter: true
      resources:
        limits:
          cpu: 500m
          memory: 0.5Gi
        requests:
          cpu: 500m
          memory: 0.5Gi
      env:
        - name: MINIO_HOST
          valueFrom:
            secretKeyRef:
              key: host
              name: milvus-jrwvla-s3-credential
        - name: MINIO_PORT
          valueFrom:
            secretKeyRef:
              key: port
              name: milvus-jrwvla-s3-credential
        - name: MINIO_ACCESS_KEY
          valueFrom:
            secretKeyRef:
              key: accessKey
              name: milvus-jrwvla-s3-credential
        - name: MINIO_SECRET_KEY
          valueFrom:
            secretKeyRef:
              key: secretKey
              name: milvus-jrwvla-s3-credential
        - name: MINIO_BUCKETNAME
          valueFrom:
            secretKeyRef:
              key: bucketnames
              name: milvus-jrwvla-s3-credential
  1. insert data
kubectl create -f -<<EOF
apiVersion: v1
kind: Pod
metadata:
  name: test-db-client-executionloop-milvus-jrwvla
  namespace: default
spec:
  containers:
    - name: test-dbclient
      imagePullPolicy: IfNotPresent
      image: apecloud-registry.cn-zhangjiakou.cr.aliyuncs.com/apecloud/dbclient:test
      args:
        - "--host"
        - "milvus-jrwvla-proxy.default.svc.cluster.local"
        - "--user"
        - ""
        - "--password"
        - ""
        - "--port"
        - "19530"
        - "--dbtype"
        - "milvus"
        - "--test"
        - "executionloop"
        - "--duration"
        - "60"
        - "--interval"
        - "1"
  restartPolicy: Never
EOF
kubectl logs -f test-db-client-executionloop-milvus-jrwvla
--host milvus-jrwvla-proxy.default.svc.cluster.local --user  --password  --port 19530 --dbtype milvus --test executionloop --duration 60 --interval 1
Collection executions_loop_collection already exists.
Delete collection executions_loop_collection
Collection executions_loop_collection deleted successfully.
Create collection executions_loop_collection
Collection executions_loop_collection created successfully.
Execution loop start: insert:executions_loop_collection:10::1:executions_loop_1
[ 1s ] executions total: 1 successful: 1 failed: 0 disconnect: 0
[ 2s ] executions total: 173 successful: 173 failed: 0 disconnect: 0
[ 3s ] executions total: 400 successful: 400 failed: 0 disconnect: 0
...
[ 58s ] executions total: 16211 successful: 16211 failed: 0 disconnect: 0
[ 59s ] executions total: 16509 successful: 16509 failed: 0 disconnect: 0
[ 60s ] executions total: 16571 successful: 16571 failed: 0 disconnect: 0
Test Result:
Total Executions: 16571
Successful Executions: 16571
Failed Executions: 0
Disconnection Counts: 0
echo "curl -s -H 'Content-Type: application/json' -X POST  http://milvus-jrwvla-proxy.default.svc.cluster.local:19530/v1/vector/query  -d '{\"collectionName\":\"executions_loop_collection\",\"filter\":\"id == 16571\",\"limit\":0,\"outputFields\":[\"id\"]}' " | kubectl exec -it milvus-jrwvla-proxy-0 --namespace default -- bash
{"code":200,"data":[{"id":16571}]}
  1. stop start
kbcli cluster stop milvus-jrwvla --auto-approve 
OpsRequest milvus-jrwvla-stop-c9kjx created successfully, you can view the progress:
	kbcli cluster describe-ops milvus-jrwvla-stop-c9kjx -n default

 kubectl get cluster 
NAME            CLUSTER-DEFINITION   VERSION   TERMINATION-POLICY   STATUS    AGE
milvus-jrwvla   milvus                         WipeOut              Stopped   45m

kbcli cluster start milvus-jrwvla              
OpsRequest milvus-jrwvla-start-6h82g created successfully, you can view the progress:
	kbcli cluster describe-ops milvus-jrwvla-start-6h82g -n default

kubectl get cluster milvus-jrwvla 
NAME            CLUSTER-DEFINITION   VERSION   TERMINATION-POLICY   STATUS    AGE
milvus-jrwvla   milvus                         WipeOut              Running   49m
  1. See error
echo "curl -s -H 'Content-Type: application/json' -X POST  http://milvus-jrwvla-proxy.default.svc.cluster.local:19530/v1/vector/query  -d '{\"collectionName\":\"executions_loop_collection\",\"filter\":\"id == 16571\",\"limit\":0,\"outputFields\":[\"id\"]}' " | kubectl exec -it milvus-jrwvla-proxy-0 --namespace default -- bash
{"code":200,"data":[]}

Expected behavior
A clear and concise description of what you expected to happen.

Screenshots
If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information):

  • OS: [e.g. iOS]
  • Browser [e.g. chrome, safari]
  • Version [e.g. 22]

Additional context
Add any other context about the problem here.

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions