-
Notifications
You must be signed in to change notification settings - Fork 3.3k
Open
Labels
kind/bugIssues or changes related a bugIssues or changes related a bugseverity/criticalCritical, lead to crash, data missing, wrong result, function totally doesn't work.Critical, lead to crash, data missing, wrong result, function totally doesn't work.triage/acceptedIndicates an issue or PR is ready to be actively worked on.Indicates an issue or PR is ready to be actively worked on.
Milestone
Description
Is there an existing issue for this?
- I have searched the existing issues
Environment
- Milvus version: 2.5-20250619-30b2a66f-amd64 -> master-20250620-c3c51681-amd64
- Deployment mode(standalone or cluster): cluster
- MQ type(rocksmq, pulsar or kafka): pulsar
- SDK version(e.g. pymilvus v2.0.0rc2):
- OS(Ubuntu or CentOS):
- CPU/Memory:
- GPU:
- Others:
Current Behavior
server
- pulsar mq
- config
common:
enabledGrowingSegmentJSONKeyStats: true
enabledJsonKeyStats: true
enabledOptimizeExpr: false
dataCoord:
enableActiveStandby: true
enabledJSONKeyStatsInSort: false
indexCoord:
enableActiveStandby: true
log:
level: debug
queryCoord:
enableActiveStandby: true
rootCoord:
enableActiveStandby: true
client
- create a collection -> index -> insert 20m entities -> flush -> index again -> load
- concurrent requests: upsert + flush + query + search
upgrading image during concurrent test
- 2.5-20250619-30b2a66f-amd64 -> master-20250620-c3c51681-amd64
The entire upgrade process took about 1 hour and 12 minutes. During the upgrade process and for a period of time after the upgrade was successful, there were search and query failures. - query
File "/src/fouram/client/check/func_check.py", line 338, in check_query_output_count
assert int(query_count) == expected_query_count, f'{query_count} == {expected_query_count}'
AssertionError: 19998794 == 20000000
assert int(query_count) == expected_query_count, f'{query_count} == {expected_query_count}'
AssertionError: 20002259 == 20000000
- search
[2025-06-20 04:38:15,373 - ERROR - fouram]: (api_response) : [Collection.search] <MilvusException: (code=901, message=fail to search on QueryNode 9: stack trace: /workspace/source/pkg/tracer/stack_trace.go:51 github.com/milvus-io/milvus/pkg/v2/tracer.StackTrace
/workspace/source/internal/util/grpcclient/client.go:575 github.com/milvus-io/milvus/internal/util/grpcclient.(*ClientBase[...]).Call
/workspace/source/internal/util/grpcclient/client.go:589 github.com/milvus-io/milvus/internal/util/grpcclient.(*ClientBase[...]).ReCall
/workspace/source/internal/distributed/querynode/client/client.go:106 github.com/milvus-io/milvus/internal/distributed/querynode/client.wrapGrpcCall[...]
/workspace/source/internal/distributed/querynode/client/client.go:224 github.com/milvus-io/milvus/internal/distributed/querynode/client.(*Client).SearchSegments
/workspace/source/internal/querynodev2/cluster/worker.go:195 github.com/milvus-io/milvus/internal/querynodev2/cluster.(*remoteWorker).SearchSegments
/workspace/source/internal/querynodev2/delegator/delegator.go:354 github.com/milvus-io/milvus/internal/querynodev2/delegator.(*shardDelegator).search.func3
/workspace/source/internal/querynodev2/delegator/delegator.go:823 github.com/milvus-io/milvus/internal/querynodev2/delegator.executeSubTasks[...].func1
/go/pkg/mod/golang.org/x/[email protected]/errgroup/errgroup.go:78 golang.org/x/sync/errgroup.(*Group).Go.func1
/usr/local/go/src/runtime/asm_amd64.s:1700 runtime.goexit: node not found)>, [requestId: e6340d5352a64d50bb7a75fdd8add84d] (api_request.py:58)
[2025-06-20 05:45:12,718 - ERROR - fouram]: (api_response) : [Collection.search] <MilvusException: (code=503, message=fail to search on QueryNode 9: channel distribution is not serviceable, required load ratio is 1.000000, current load ratio is 0.851950: channel not available[channel=zong-roll-upsert-4-rootcoord-dml_0_458853861888623318v0])>, [requestId: bb86a2ed859f42b1a2e6304120df3f19] (api_request.py:58)
Expected Behavior
No response
Steps To Reproduce
https://argo-workflows.zilliz.cc/archived-workflows/qa/c9a7cb7a-9d2a-40e0-824e-ca1881b8bf3e?nodeId=zong-roll-upsert-4
Milvus Log
pods:
zong-roll-upsert-4-milvus-datanode-65c99cd5f6-2r9j9 1/1 Running 0 172m 10.104.19.239 4am-node28 <none> <none>
zong-roll-upsert-4-milvus-datanode-65c99cd5f6-8psjt 1/1 Running 0 173m 10.104.34.65 4am-node37 <none> <none>
zong-roll-upsert-4-milvus-mixcoord-66cd768b44-qm82l 1/1 Running 0 3h56m 10.104.18.223 4am-node25 <none> <none>
zong-roll-upsert-4-milvus-proxy-5b499984bd-kfqdb 1/1 Running 0 171m 10.104.34.68 4am-node37 <none> <none>
zong-roll-upsert-4-milvus-querynode-1-c67459bcf-8f2bq 1/1 Running 0 3h24m 10.104.23.84 4am-node27 <none> <none>
zong-roll-upsert-4-milvus-querynode-1-c67459bcf-h7q5b 1/1 Running 0 3h55m 10.104.19.220 4am-node28 <none> <none>
zong-roll-upsert-4-milvus-streamingnode-5c6f6b58b8-7jcmb 1/1 Running 0 3h57m 10.104.18.221 4am-node25 <none> <none>
zong-roll-upsert-4-minio-0 1/1 Running 0 5h3m 10.104.18.139 4am-node25 <none> <none>
zong-roll-upsert-4-minio-1 1/1 Running 0 5h3m 10.104.16.215 4am-node21 <none> <none>
zong-roll-upsert-4-minio-2 1/1 Running 0 5h3m 10.104.30.211 4am-node38 <none> <none>
zong-roll-upsert-4-minio-3 1/1 Running 0 5h3m 10.104.19.185 4am-node28 <none> <none>
Anything else?
No response
Metadata
Metadata
Assignees
Labels
kind/bugIssues or changes related a bugIssues or changes related a bugseverity/criticalCritical, lead to crash, data missing, wrong result, function totally doesn't work.Critical, lead to crash, data missing, wrong result, function totally doesn't work.triage/acceptedIndicates an issue or PR is ready to be actively worked on.Indicates an issue or PR is ready to be actively worked on.