Skip to content

[Bug]: [benchmark][cluster] Enabled eviction & all mmap, queryNode is suddenly OOM killed during serial search #43145

@wangting0128

Description

@wangting0128

Is there an existing issue for this?

  • I have searched the existing issues

Environment

- Milvus version:master-20250703-bbbc7d45-amd64
- Deployment mode(standalone or cluster):cluster
- MQ type(rocksmq, pulsar or kafka):pulsar    
- SDK version(e.g. pymilvus v2.0.0rc2):2.6.0rc151
- OS(Ubuntu or CentOS): 
- CPU/Memory: 
- GPU: 
- Others:

Current Behavior

argo task: fouramf-k8v65

server:

NAME                                                              READY   STATUS              RESTARTS        AGE     IP              NODE         NOMINATED NODE   READINESS GATES
verify-cl-evict-3-etcd-0                                          1/1     Running             0               22h     10.104.33.236   4am-node36   <none>           <none>
verify-cl-evict-3-etcd-1                                          1/1     Running             0               22h     10.104.20.190   4am-node22   <none>           <none>
verify-cl-evict-3-etcd-2                                          1/1     Running             0               22h     10.104.21.222   4am-node24   <none>           <none>
verify-cl-evict-3-milvus-datanode-7b46b6dc44-4zzjc                1/1     Running             2 (22h ago)     22h     10.104.13.102   4am-node16   <none>           <none>
verify-cl-evict-3-milvus-datanode-7b46b6dc44-5xl2q                1/1     Running             2 (22h ago)     22h     10.104.9.237    4am-node14   <none>           <none>
verify-cl-evict-3-milvus-datanode-7b46b6dc44-s6xkz                1/1     Running             2 (22h ago)     22h     10.104.30.73    4am-node38   <none>           <none>
verify-cl-evict-3-milvus-datanode-7b46b6dc44-wgkgb                1/1     Running             2 (22h ago)     22h     10.104.24.134   4am-node29   <none>           <none>
verify-cl-evict-3-milvus-mixcoord-5459f54d66-89mdf                1/1     Running             3 (15h ago)     22h     10.104.30.71    4am-node38   <none>           <none>
verify-cl-evict-3-milvus-proxy-775d6d967b-7sczl                   1/1     Running             2 (22h ago)     22h     10.104.30.72    4am-node38   <none>           <none>
verify-cl-evict-3-milvus-querynode-85c46989d-k6zbw                1/1     Running             3 (16h ago)     22h     10.104.30.70    4am-node38   <none>           <none>
verify-cl-evict-3-milvus-streamingnode-6c5d499bcd-pmxtv           1/1     Running             2 (22h ago)     22h     10.104.27.172   4am-node31   <none>           <none>
verify-cl-evict-3-minio-0                                         1/1     Running             0               22h     10.104.15.192   4am-node20   <none>           <none>
verify-cl-evict-3-minio-1                                         1/1     Running             0               22h     10.104.20.188   4am-node22   <none>           <none>
verify-cl-evict-3-minio-2                                         1/1     Running             0               22h     10.104.17.115   4am-node23   <none>           <none>
verify-cl-evict-3-minio-3                                         1/1     Running             0               22h     10.104.16.72    4am-node21   <none>           <none>
verify-cl-evict-3-pulsarv3-bookie-0                               1/1     Running             0               22h     10.104.16.71    4am-node21   <none>           <none>
verify-cl-evict-3-pulsarv3-bookie-1                               1/1     Running             0               22h     10.104.21.221   4am-node24   <none>           <none>
verify-cl-evict-3-pulsarv3-bookie-2                               1/1     Running             0               22h     10.104.17.118   4am-node23   <none>           <none>
verify-cl-evict-3-pulsarv3-bookie-init-nmtdn                      0/1     Completed           0               22h     10.104.15.188   4am-node20   <none>           <none>
verify-cl-evict-3-pulsarv3-broker-0                               1/1     Running             0               22h     10.104.14.64    4am-node18   <none>           <none>
verify-cl-evict-3-pulsarv3-broker-1                               1/1     Running             0               22h     10.104.20.184   4am-node22   <none>           <none>
verify-cl-evict-3-pulsarv3-proxy-0                                1/1     Running             0               22h     10.104.14.60    4am-node18   <none>           <none>
verify-cl-evict-3-pulsarv3-proxy-1                                1/1     Running             0               22h     10.104.15.189   4am-node20   <none>           <none>
verify-cl-evict-3-pulsarv3-pulsar-init-tqrdj                      0/1     Completed           0               22h     10.104.14.65    4am-node18   <none>           <none>
verify-cl-evict-3-pulsarv3-recovery-0                             1/1     Running             0               22h     10.104.14.66    4am-node18   <none>           <none>
verify-cl-evict-3-pulsarv3-zookeeper-0                            1/1     Running             0               22h     10.104.20.189   4am-node22   <none>           <none>
verify-cl-evict-3-pulsarv3-zookeeper-1                            1/1     Running             0               22h     10.104.15.193   4am-node20   <none>           <none>
verify-cl-evict-3-pulsarv3-zookeeper-2                            1/1     Running             0               22h     10.104.16.69    4am-node21   <none>           <none>
Image Image

Image

Expected Behavior

No response

Steps To Reproduce

1. create a collection with fields: 'id'(primary key), 'float_vector'(128dim)
2. insert 50 million data
3. flush collection
4. build HNSW index on vector field
5. load collection
6. serial search with different params -> queryNode OOM killed

Milvus Log

No response

Anything else?

server config:

dataNode:
  replicas: 4
  resources:
    limits:
      cpu: '16.0'
      memory: 8Gi
    requests:
      cpu: '2.0'
      memory: 4Gi
streamingNode:
  resources:
    limits:
      cpu: '2.0'
      memory: 8Gi
    requests:
      cpu: '2.0'
      memory: 8Gi
queryNode:
  disk:
    size:
      enabled: true
  resources:
    limits:
      cpu: '4.0'
      memory: 8Gi
      ephemeral-storage: 10Gi
    requests:
      cpu: '2.0'
      memory: 8Gi
extraConfigFiles:
  user.yaml: |-
    queryNode:
      segcore:
        tieredStorage:
          evictionEnabled: true
      mmap:
        scalarField: true
        scalarIndex: true
        vectorField: true
        vectorIndex: true
    dataCoord:
      segment:
        sealProportion: 1
    indexCoord:
      scheduler:
        interval: 1

client config:

{
     "dataset_params": {
          "metric_type": "L2",
          "dim": 128,
          "dataset_name": "sift",
          "dataset_size": "50m",
          "ni_per": 50000,
          "req_run_counts": 10
     },
     "collection_params": {
          "collection_name": "eviction_1",
          "other_fields": [],
          "shards_num": 2
     },
     "search_params": {
          "top_k": [
               10,
               100
          ],
          "nq": 10000,
          "search_param": {
               "ef": [
                    128,
                    256
               ]
          },
          "timeout": 36000
     },
     "index_params": {
          "index_type": "HNSW",
          "index_param": {
               "M": 16,
               "efConstruction": 500
          }
     }
}

Metadata

Metadata

Assignees

Labels

kind/bugIssues or changes related a bugtest/benchmarkbenchmark testtriage/acceptedIndicates an issue or PR is ready to be actively worked on.

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions