-
Notifications
You must be signed in to change notification settings - Fork 3.3k
Open
Labels
kind/bugIssues or changes related a bugIssues or changes related a bugseverity/criticalCritical, lead to crash, data missing, wrong result, function totally doesn't work.Critical, lead to crash, data missing, wrong result, function totally doesn't work.triage/acceptedIndicates an issue or PR is ready to be actively worked on.Indicates an issue or PR is ready to be actively worked on.
Milestone
Description
Is there an existing issue for this?
- I have searched the existing issues
Environment
- Milvus version: #41824
- Deployment mode(standalone or cluster):
- MQ type(rocksmq, pulsar or kafka): woodpecker
- SDK version(e.g. pymilvus v2.0.0rc2):
- OS(Ubuntu or CentOS):
- CPU/Memory:
- GPU:
- Others:
Current Behavior
wal write message into underlying wal, and return 6/7xx, but after recovering at another streamingnode,
wal write a new message into underlying wal but return 2/0, and the consume operation from 6/5xx can not read any entry. Break the consistent of wal, the system cannot recovered anymore.
2025-05-14 18:32:12.586 [2025/05/14 10:32:12.586 +00:00] [INFO] [adaptor/wal_adaptor.go:242] ["wal graceful close done, wait for operation to be finished..."] [module=streamingnode] [component=wal] [channel=yz-chaos-clu-wp-mix-fix71-rootcoord-dml_0:rw@6]
2025-05-14 18:32:12.586 [2025/05/14 10:32:12.586 +00:00] [INFO] [adaptor/wal_adaptor.go:250] ["wal begin to close flusher..."] [module=streamingnode] [component=wal] [channel=yz-chaos-clu-wp-mix-fix71-rootcoord-dml_0:rw@6]
2025-05-14 18:32:12.586 [2025/05/14 10:32:12.586 +00:00] [INFO] [adaptor/scanner_adaptor.go:146] ["the produce event loop of scanner is closed"] [module=streamingnode] [component=scanner] [name=yz-chaos-clu-wp-mix-fix71-rootcoord-dml_0/6/2] [channel=yz-chaos-clu-wp-mix-fix71-rootcoord-dml_0]
2025-05-14 18:32:12.586 [2025/05/14 10:32:12.586 +00:00] [WARN] [producer/produce_server.go:93] ["send arm of stream closed by unexpected error"] [module=streamingnode] [component=producer-server] [channel=yz-chaos-clu-wp-mix-fix71-rootcoord-dml_0] [term=6] [error="send loop is stopped for close of wal"]
2025-05-14 18:32:12.586 [2025/05/14 10:32:12.586 +00:00] [INFO] [adaptor/scanner_adaptor.go:154] ["the consuming event loop of scanner is closed"] [module=streamingnode] [component=scanner] [name=yz-chaos-clu-wp-mix-fix71-rootcoord-dml_0/6/2] [channel=yz-chaos-clu-wp-mix-fix71-rootcoord-dml_0]
2025-05-14 18:32:12.586 [2025/05/14 10:32:12.586 +00:00] [INFO] [adaptor/scanner_adaptor.go:133] ["scanner is closed"] [module=streamingnode] [component=scanner] [name=yz-chaos-clu-wp-mix-fix71-rootcoord-dml_0/6/2] [channel=yz-chaos-clu-wp-mix-fix71-rootcoord-dml_0]
2025-05-14 18:32:12.586 [2025/05/14 10:32:12.586 +00:00] [WARN] [producer/produce_server.go:93] ["send arm of stream closed by unexpected error"] [module=streamingnode] [component=producer-server] [channel=yz-chaos-clu-wp-mix-fix71-rootcoord-dml_0] [term=6] [error="send loop is stopped for close of wal"]
2025-05-14 18:32:12.586 [2025/05/14 10:32:12.586 +00:00] [WARN] [producer/produce_server.go:145] ["recv arm of stream closed by unexpected error"] [module=streamingnode] [component=producer-server] [channel=yz-chaos-clu-wp-mix-fix71-rootcoord-dml_0] [term=6] [error="rpc error: code = Canceled desc = context canceled"]
2025-05-14 18:32:12.586 [2025/05/14 10:32:12.586 +00:00] [WARN] [producer/produce_server.go:145] ["recv arm of stream closed by unexpected error"] [module=streamingnode] [component=producer-server] [channel=yz-chaos-clu-wp-mix-fix71-rootcoord-dml_0] [term=6] [error="rpc error: code = Canceled desc = context canceled"]
2025-05-14 18:32:12.586 [2025/05/14 10:32:12.586 +00:00] [INFO] [flusherimpl/flusher_components.go:146] ["data sync service closed for flusher closing"] [module=streamingnode] [component=flusher] [pchannel=yz-chaos-clu-wp-mix-fix71-rootcoord-dml_0:rw@6] [vchannel=yz-chaos-clu-wp-mix-fix71-rootcoord-dml_0_458022832186351001v0]
2025-05-14 18:32:12.586 [2025/05/14 10:32:12.586 +00:00] [INFO] [flusherimpl/flusher_components.go:146] ["data sync service closed for flusher closing"] [module=streamingnode] [component=flusher] [pchannel=yz-chaos-clu-wp-mix-fix71-rootcoord-dml_0:rw@6] [vchannel=yz-chaos-clu-wp-mix-fix71-rootcoord-dml_0_458022021334041933v0]
2025-05-14 18:32:12.586 [2025/05/14 10:32:12.586 +00:00] [INFO] [flusherimpl/flusher_components.go:146] ["data sync service closed for flusher closing"] [module=streamingnode] [component=flusher] [pchannel=yz-chaos-clu-wp-mix-fix71-rootcoord-dml_0:rw@6] [vchannel=yz-chaos-clu-wp-mix-fix71-rootcoord-dml_0_458022738515455585v0]
2025-05-14 18:32:12.586 [2025/05/14 10:32:12.586 +00:00] [INFO] [flusherimpl/wal_flusher.go:63] ["wal flusher stop"] [module=streamingnode] [component=flusher] [pchannel=yz-chaos-clu-wp-mix-fix71-rootcoord-dml_0:rw@6]
2025-05-14 18:32:12.586 [2025/05/14 10:32:12.586 +00:00] [INFO] [flusherimpl/wal_flusher.go:120] ["wal flusher start to close the recovery storage..."] [module=streamingnode] [component=flusher] [pchannel=yz-chaos-clu-wp-mix-fix71-rootcoord-dml_0:rw@6]
2025-05-14 18:32:12.586 [2025/05/14 10:32:12.586 +00:00] [INFO] [recovery/recovery_background_task.go:40] ["recovery storage background task, perform a graceful exit..."] [module=streamingnode] [component=recovery-storage] [channel=yz-chaos-clu-wp-mix-fix71-rootcoord-dml_0:rw@6] [state=working]
2025-05-14 18:32:12.631 [2025/05/14 10:32:12.630 +00:00] [INFO] [recovery/recovery_background_task.go:77] ["persist dirty snapshot"] [module=streamingnode] [component=recovery-storage] [channel=yz-chaos-clu-wp-mix-fix71-rootcoord-dml_0:rw@6] [state=working] [checkpoint=6/493] [checkpointTimeTick=458022906520928279] [vchannelCount=0] [segmentCount=16]
2025-05-14 18:32:12.631 [2025/05/14 10:32:12.631 +00:00] [INFO] [recovery/recovery_background_task.go:32] ["recovery storage background task exit"] [module=streamingnode] [component=recovery-storage] [channel=yz-chaos-clu-wp-mix-fix71-rootcoord-dml_0:rw@6] [state=working]
2025-05-14 18:32:12.631 [2025/05/14 10:32:12.631 +00:00] [INFO] [recovery/wal_truncator.go:69] ["sampling truncator background task exit"] [module=streamingnode] [component=recovery-storage] [channel=yz-chaos-clu-wp-mix-fix71-rootcoord-dml_0:rw@6] [state=working]
2025-05-14 18:32:12.631 [2025/05/14 10:32:12.631 +00:00] [INFO] [flusherimpl/wal_flusher.go:122] ["recovery storage closed"] [module=streamingnode] [component=flusher] [pchannel=yz-chaos-clu-wp-mix-fix71-rootcoord-dml_0:rw@6]
2025-05-14 18:32:12.631 [2025/05/14 10:32:12.631 +00:00] [INFO] [adaptor/wal_adaptor.go:256] ["wal begin to close scanners..."] [module=streamingnode] [component=wal] [channel=yz-chaos-clu-wp-mix-fix71-rootcoord-dml_0:rw@6]
2025-05-14 18:32:12.631 [2025/05/14 10:32:12.631 +00:00] [INFO] [adaptor/scanner_adaptor.go:146] ["the produce event loop of scanner is closed"] [module=streamingnode] [component=scanner] [name=yz-chaos-clu-wp-mix-fix71-rootcoord-dml_0/6/1] [channel=yz-chaos-clu-wp-mix-fix71-rootcoord-dml_0]
2025-05-14 18:32:12.631 [2025/05/14 10:32:12.631 +00:00] [INFO] [adaptor/scanner_adaptor.go:154] ["the consuming event loop of scanner is closed"] [module=streamingnode] [component=scanner] [name=yz-chaos-clu-wp-mix-fix71-rootcoord-dml_0/6/1] [channel=yz-chaos-clu-wp-mix-fix71-rootcoord-dml_0]
2025-05-14 18:32:12.631 [2025/05/14 10:32:12.631 +00:00] [INFO] [adaptor/scanner_adaptor.go:133] ["scanner is closed"] [module=streamingnode] [component=scanner] [name=yz-chaos-clu-wp-mix-fix71-rootcoord-dml_0/6/1] [channel=yz-chaos-clu-wp-mix-fix71-rootcoord-dml_0]
2025-05-14 18:32:12.631 [2025/05/14 10:32:12.631 +00:00] [INFO] [adaptor/wal_adaptor.go:265] ["scanner close done, close inner wal..."] [module=streamingnode] [component=wal] [channel=yz-chaos-clu-wp-mix-fix71-rootcoord-dml_0:rw@6]
2025-05-14 18:32:12.665 [2025/05/14 10:32:12.665 +00:00] [INFO] [adaptor/wal_adaptor.go:268] ["wal close done, close interceptors..."] [module=streamingnode] [component=wal] [channel=yz-chaos-clu-wp-mix-fix71-rootcoord-dml_0:rw@6]
2025-05-14 18:32:12.665 [2025/05/14 10:32:12.665 +00:00] [INFO] [adaptor/wal_adaptor.go:272] ["close the write ahead buffer..."] [module=streamingnode] [component=wal] [channel=yz-chaos-clu-wp-mix-fix71-rootcoord-dml_0:rw@6]
2025-05-14 18:32:12.665 [2025/05/14 10:32:12.665 +00:00] [INFO] [adaptor/wal_adaptor.go:275] ["close the segment assignment manager..."] [module=streamingnode] [component=wal] [channel=yz-chaos-clu-wp-mix-fix71-rootcoord-dml_0:rw@6]
2025-05-14 18:32:12.665 [2025/05/14 10:32:12.665 +00:00] [INFO] [adaptor/wal_adaptor.go:278] ["call wal cleanup function..."] [module=streamingnode] [component=wal] [channel=yz-chaos-clu-wp-mix-fix71-rootcoord-dml_0:rw@6]
2025-05-14 18:32:12.666 [2025/05/14 10:32:12.665 +00:00] [INFO] [adaptor/wal_adaptor.go:280] ["wal closed"] [module=streamingnode] [component=wal] [channel=yz-chaos-clu-wp-mix-fix71-rootcoord-dml_0:rw@6]
2025-05-14 18:32:12.666 [2025/05/14 10:32:12.666 +00:00] [INFO] [walmanager/wal_lifetime.go:143] ["close current term wal done"] [module=streamingnode] [component=wal-manager] [channel=yz-chaos-clu-wp-mix-fix71-rootcoord-dml_0] [expected=(6,false)] [current=(6,true)]
2025-05-14 18:32:12.666 [2025/05/14 10:32:12.666 +00:00] [INFO] [walmanager/manager_impl.go:89] ["remove wal success"] [module=streamingnode] [component=wal-manager] [channel=yz-chaos-clu-wp-mix-fix71-rootcoord-dml_0] [term=6]
2025-05-14 18:32:12.704 [2025/05/14 10:32:12.704 +00:00] [INFO] [walmanager/wal_lifetime.go:118] ["expected state changed, do a life cycle"] [module=streamingnode] [component=wal-manager] [channel=yz-chaos-clu-wp-mix-fix71-rootcoord-dml_0] [expected=(7,true)]
2025-05-14 18:32:12.712 [2025/05/14 10:32:12.712 +00:00] [INFO] [adaptor/initializing.go:53] ["start to sync first time tick"] [module=streamingnode] [channel=yz-chaos-clu-wp-mix-fix71-rootcoord-dml_0:rw@7]
2025-05-14 18:32:13.772 [2025/05/14 10:32:13.772 +00:00] [INFO] [adaptor/initializing.go:59] ["sync first time tick done"] [module=streamingnode] [channel=yz-chaos-clu-wp-mix-fix71-rootcoord-dml_0:rw@7] [msgID=2/0] [timetick=458022907451540458]
2025-05-14 18:32:13.773 [2025/05/14 10:32:13.773 +00:00] [INFO] [recovery/recovery_persisted.go:41] ["recover checkpoint done"] [module=streamingnode] [component=recovery-storage] [channel=yz-chaos-clu-wp-mix-fix71-rootcoord-dml_0:rw@7] [state=persist-recovering] [checkpoint=6/493] [timetick=458022906520928279] [magic=1]
2025-05-14 18:32:13.775 [2025/05/14 10:32:13.775 +00:00] [INFO] [recovery/recovery_persisted.go:54] ["recovery vchannel info done"] [module=streamingnode] [component=recovery-storage] [channel=yz-chaos-clu-wp-mix-fix71-rootcoord-dml_0:rw@7] [state=persist-recovering] [vchannels=3]
2025-05-14 18:32:13.776 [2025/05/14 10:32:13.775 +00:00] [INFO] [recovery/recovery_persisted.go:65] ["recover segment info done"] [module=streamingnode] [component=recovery-storage] [channel=yz-chaos-clu-wp-mix-fix71-rootcoord-dml_0:rw@7] [state=persist-recovering] [segments=16]
2025-05-14 18:32:13.776 [2025/05/14 10:32:13.776 +00:00] [INFO] [recovery/recovery_stream.go:33] ["recover from wal stream..."] [module=streamingnode] [component=recovery-storage] [channel=yz-chaos-clu-wp-mix-fix71-rootcoord-dml_0:rw@7] [startMessageID=6/493] [fromTimeTick=458022906520928279] [toTimeTick=458022907451540458] [state=stream-recovering]
2025-05-14 18:32:13.776 [2025/05/14 10:32:13.776 +00:00] [INFO] [adaptor/scanner_adaptor.go:135] ["scanner start background task"] [module=streamingnode] [component=scanner] [name=recovery] [channel=yz-chaos-clu-wp-mix-fix71-rootcoord-dml_0:rw@7] [startMessageID=6/493]
2025-05-14 18:32:13.776 [2025/05/14 10:32:13.776 +00:00] [INFO] [adaptor/scanner_adaptor.go:177] ["start produce loop of scanner at model"] [module=streamingnode] [component=scanner] [name=recovery] [channel=yz-chaos-clu-wp-mix-fix71-rootcoord-dml_0:rw@7] [startMessageID=6/493] [model=catchup]
Expected Behavior
No response
Steps To Reproduce
Milvus Log
No response
Anything else?
No response
Metadata
Metadata
Assignees
Labels
kind/bugIssues or changes related a bugIssues or changes related a bugseverity/criticalCritical, lead to crash, data missing, wrong result, function totally doesn't work.Critical, lead to crash, data missing, wrong result, function totally doesn't work.triage/acceptedIndicates an issue or PR is ready to be actively worked on.Indicates an issue or PR is ready to be actively worked on.