Skip to content

[Bug]: [benchmark][cluster] Drop collection raises Etcd MultiSave error and build index save meta fail in concurrent DQL & DDL scene #42878

@wangting0128

Description

@wangting0128

Is there an existing issue for this?

  • I have searched the existing issues

Environment

- Milvus version:master-20250619-59366297-amd64
- Deployment mode(standalone or cluster):cluster
- MQ type(rocksmq, pulsar or kafka):pulsar    
- SDK version(e.g. pymilvus v2.0.0rc2):2.6.0rc125
- OS(Ubuntu or CentOS): 
- CPU/Memory: 
- GPU: 
- Others:

Current Behavior

argo task: json-path-corn-1750381200
test case name: test_json_path_locust_dql_ddl_streaming_cluster

server:

 NAME                                                              READY   STATUS        RESTARTS         AGE     IP              NODE         NOMINATED NODE   READINESS GATES
json-path-corn-1750381200-4-etcd-0                                1/1     Running       0                3h48m   10.104.30.58    4am-node38   <none>           <none>
json-path-corn-1750381200-4-etcd-1                                1/1     Running       0                3h48m   10.104.19.90    4am-node28   <none>           <none>
json-path-corn-1750381200-4-etcd-2                                1/1     Running       0                3h48m   10.104.24.148   4am-node29   <none>           <none>
json-path-corn-1750381200-4-milvus-datanode-858c88f5c-ddfts       1/1     Running       4 (3h47m ago)    3h48m   10.104.18.100   4am-node25   <none>           <none>
json-path-corn-1750381200-4-milvus-datanode-858c88f5c-mcww6       1/1     Running       4 (3h47m ago)    3h48m   10.104.6.2      4am-node13   <none>           <none>
json-path-corn-1750381200-4-milvus-datanode-858c88f5c-sxzh9       1/1     Running       4 (3h46m ago)    3h48m   10.104.26.235   4am-node32   <none>           <none>
json-path-corn-1750381200-4-milvus-datanode-858c88f5c-vh6h8       1/1     Running       4 (3h46m ago)    3h48m   10.104.17.9     4am-node23   <none>           <none>
json-path-corn-1750381200-4-milvus-datanode-858c88f5c-xtkh4       1/1     Running       4 (3h47m ago)    3h48m   10.104.15.107   4am-node20   <none>           <none>
json-path-corn-1750381200-4-milvus-datanode-858c88f5c-zpdhv       1/1     Running       4 (3h47m ago)    3h48m   10.104.34.213   4am-node37   <none>           <none>
json-path-corn-1750381200-4-milvus-mixcoord-bc87c6c59-4q7kr       1/1     Running       4 (3h47m ago)    3h48m   10.104.34.212   4am-node37   <none>           <none>
json-path-corn-1750381200-4-milvus-proxy-5bd7b49667-6v2rv         1/1     Running       4 (3h47m ago)    3h48m   10.104.20.245   4am-node22   <none>           <none>
json-path-corn-1750381200-4-milvus-querynode-5bc4b5c8dd-fzdrs     1/1     Running       4 (3h47m ago)    3h48m   10.104.20.246   4am-node22   <none>           <none>
json-path-corn-1750381200-4-milvus-streamingnode-7746d9c84rbn4t   1/1     Running       4 (3h47m ago)    3h48m   10.104.26.234   4am-node32   <none>           <none>
json-path-corn-1750381200-4-minio-0                               1/1     Running       0                3h48m   10.104.24.142   4am-node29   <none>           <none>
json-path-corn-1750381200-4-minio-1                               1/1     Running       0                3h48m   10.104.16.45    4am-node21   <none>           <none>
json-path-corn-1750381200-4-minio-2                               1/1     Running       0                3h48m   10.104.19.88    4am-node28   <none>           <none>
json-path-corn-1750381200-4-minio-3                               1/1     Running       0                3h48m   10.104.30.60    4am-node38   <none>           <none>
json-path-corn-1750381200-4-pulsarv3-bookie-0                     1/1     Running       0                3h48m   10.104.23.128   4am-node27   <none>           <none>
json-path-corn-1750381200-4-pulsarv3-bookie-1                     1/1     Running       0                3h48m   10.104.16.46    4am-node21   <none>           <none>
json-path-corn-1750381200-4-pulsarv3-bookie-2                     1/1     Running       0                3h48m   10.104.30.62    4am-node38   <none>           <none>
json-path-corn-1750381200-4-pulsarv3-bookie-init-xblst            0/1     Completed     0                3h48m   10.104.13.67    4am-node16   <none>           <none>
json-path-corn-1750381200-4-pulsarv3-broker-0                     1/1     Running       0                3h48m   10.104.13.68    4am-node16   <none>           <none>
json-path-corn-1750381200-4-pulsarv3-broker-1                     1/1     Running       0                3h48m   10.104.19.62    4am-node28   <none>           <none>
json-path-corn-1750381200-4-pulsarv3-proxy-0                      1/1     Running       0                3h48m   10.104.30.37    4am-node38   <none>           <none>
json-path-corn-1750381200-4-pulsarv3-proxy-1                      1/1     Running       0                3h48m   10.104.24.118   4am-node29   <none>           <none>
json-path-corn-1750381200-4-pulsarv3-pulsar-init-rdx5l            0/1     Completed     0                3h48m   10.104.13.66    4am-node16   <none>           <none>
json-path-corn-1750381200-4-pulsarv3-recovery-0                   1/1     Running       0                3h48m   10.104.24.119   4am-node29   <none>           <none>
json-path-corn-1750381200-4-pulsarv3-zookeeper-0                  1/1     Running       0                3h48m   10.104.16.42    4am-node21   <none>           <none>
json-path-corn-1750381200-4-pulsarv3-zookeeper-1                  1/1     Running       0                3h48m   10.104.19.86    4am-node28   <none>           <none>
json-path-corn-1750381200-4-pulsarv3-zookeeper-2                  1/1     Running       0                3h48m   10.104.24.147   4am-node29   <none>           <none>

{pod=~"json-path-corn-1750381200-4-milvus-.*"} |~ "etcdserver: request timed out|377ddf5a3263eb3346970e2f6dc3cb9a"
Image

{pod=~"json-path-corn-1750381200-4-milvus-.*"} |~ "5d80bfe22da94f5f9f9a201253f118b5"
Image

At 03:19 when the requests error ware reported, etcd monitoring as follow

Image Image

client log:

[2025-06-20 03:19:12,610 - DEBUG - fouram]: (api_request)  : [drop_collection] args: ['scene_test_t33vGWai', None, 'default'], kwargs: {}, [requestId: 45d2e8db570c4a9a8981fa9d27f5ca26] (api_request.py:83)
2025-06-20 03:19:22,619 [ERROR][handler]: RPC error: [drop_collection], <MilvusException: (code=65535, message=etcdserver: request timed out)>, <Time:{'RPC start': '2025-06-20 03:19:12.610813', 'RPC error': '2025-06-20 03:19:22.619637'}> (decorators.py:140)
[2025-06-20 03:19:22,624 - ERROR - fouram]: (api_response) : [drop_collection] <MilvusException: (code=65535, message=etcdserver: request timed out)>, [requestId: 45d2e8db570c4a9a8981fa9d27f5ca26] (api_request.py:58)

[2025-06-20 03:19:09,504 - DEBUG - fouram]: (api_request)  : [Index] args: [<Collection>:
-------------
<name>: scene_test_rX2mrLMg
<description>: 
<schema>: {'auto_id': False, 'description': '', 'fields': [{'name': 'id', 'description': '', 'type': <DataType.INT64: 5>, 'is_primary': True, 'auto_id': False}, {'name': 'float_vector', 'description': '', 'type': <DataType.FLOAT_VECTOR: 101>, 'params': {'dim': 128}}, {'name': 'float_vector_1', 'description': '', 'type': <DataType.FLOAT_VECTOR: 101>, 'params': {'dim': 128}}, {'name': 'json_1', 'description': '', 'type': <DataType.JSON: 23>, 'nullable': True}, {'name': 'json_2', 'description': '', 'type': <DataType.JSON: 23>, 'nullable': True}], 'enable_dynamic_field': True}
, 'float_vector_1', {'index_type': 'IVF_SQ8', 'metric_type': 'L2', 'params': {'nlist': 2048}}], kwargs: {'client_request_id': '5d80bfe22da94f5f9f9a201253f118b5'}, [requestId: 5d80bfe22da94f5f9f9a201253f118b5] (api_request.py:83)
2025-06-20 03:19:23,285 [ERROR][handler]: RPC error: [create_index], <MilvusException: (code=10001, message=context deadline exceeded)>, <Time:{'RPC start': '2025-06-20 03:19:09.504226', 'RPC error': '2025-06-20 03:19:23.285422'}> (decorators.py:140)
[2025-06-20 03:19:23,288 - ERROR - fouram]: (api_response) : [Index] <MilvusException: (code=10001, message=context deadline exceeded)>, [requestId: 5d80bfe22da94f5f9f9a201253f118b5] (api_request.py:58)

Expected Behavior

No response

Steps To Reproduce

concurrent test and calculation of RT and QPS

        :purpose:  `primary key: INT64`, shards_num=2, enabled dynamic field, DQL & DDL
                    2 fields of different vector types, json and dynamic fields

        :test steps:
            1. create collection with fields:
                'float_vector': 128dim
                'float_vector_1': 768dim
                'id': primary key type is INT64

                'json_1': scalar json field, random_range[0, 1000] & None value
                'json_2': scalar json field, {'id': <all cast type>} & None value
                'json_dynamic_1': dynamic field, random_range[0, 1000] & None value
                'json_dynamic_2': dynamic field, {'id': <all cast type>} & None value
            2. build indexes:
                HNSW: 'float_vector'
                IVF_SQ8: 'float_vector_1'

                JsonPathIndex - 'DOUBLE': 'json_1', 'json_2["id"]', 'json_dynamic_1', 'json_dynamic_2["id"]'
            3. insert 12 million data
            4. flush collection
            5. build indexes again using the same params
            6. load collection
            7. concurrent request:
                - search
                - query
                - hybrid_search
                - scene_test
                    (collection: create->insert->flush->index->drop) <- drop collection & build index failed
                - scene_search_test
                    (collection: create->insert->flush->index->load->search->drop)
                - scene_hybrid_search_test: 4 vector fields, 2 scalar fields, dynamic field
                    (collection: create->insert->flush->index->load->hybrid_search->drop)

Milvus Log

No response

Anything else?

No response

Metadata

Metadata

Assignees

Labels

kind/bugIssues or changes related a bugstaleindicates no udpates for 30 daystest/benchmarkbenchmark testtriage/acceptedIndicates an issue or PR is ready to be actively worked on.

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions