Skip to content

vioscsi: fix infinite loop on surprise removal if srb_list is not empty #1351

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
May 6, 2025

Conversation

sb-ntnx
Copy link
Contributor

@sb-ntnx sb-ntnx commented Apr 23, 2025

We have discovered a problem introduced with 35331da where a guest would hang on LUN removal if there are pending IO requests in the processing_srbs queue.

The fix is pretty trivial.

The stuck PNP thread looks like this:

System (ffffae0403a93040) ffffae0409f9d300 (E|K|W|R|V) 4.1500           0    59s.328            4773 Executive   58s.750 Running on processor 3

Irp List:
    IRP              File Driver
    ffffae040ab4c460      vioscsi

Priority:
    Current Base Decrement ForegroundBoost IO Page
    13      12   0         0               0  5

 # Child-SP         Return           Call Site                                 Source
 0 (Inline)         ---------------- vioscsi!SrbGetTargetId                    F:\ewdk\2022\Program Files\Windows Kits\10\Include\10.0.22000.0\km\srbhelper.h @ 1333
 1 ffffd10c3652e9c0 fffff8016782e5a7 vioscsi!VioScsiUnitControl+0x19c          E:\windows-drivers-build-root\virtio-release-1.2.4\vioscsi\vioscsi.c @ 1320
 2 ffffd10c3652ea70 fffff80167956116 storport!RaCallMiniportUnitControl+0x27   
 3 ffffd10c3652eaa0 fffff8016782d499 storport!RaUnitSurpriseRemovalIrp+0x122   
 4 ffffd10c3652eba0 fffff8016782c7b0 storport!RaUnitPnpIrp+0x609               
 5 ffffd10c3652ec80 fffff801d4e9697e storport!RaDriverPnpIrp+0x90              
 6 ffffd10c3652ecc0 fffff80168a21c38 nt!IofCallDriver+0xbe                     
 7 ffffd10c3652ed00 fffff801689c510a CLASSPNP!ClassDispatchPnp+0x4028          
 8 ffffd10c3652ee20 fffff801d4e9697e CLASSPNP!ClassGlobalDispatch+0x3a         
 9 ffffd10c3652ee50 fffff801d504ab7e nt!IofCallDriver+0xbe                     
 a ffffd10c3652ee90 fffff801d560b671 nt!IoSynchronousCallDriver+0x4e           
 b ffffd10c3652eef0 fffff80167570026 nt!IoForwardIrpSynchronously+0x41         
 c ffffd10c3652ef20 fffff8016757952a partmgr!PmSurpriseRemoval+0x26            
 d ffffd10c3652ef50 fffff80167553bb1 partmgr!PmPnp+0x4cca                      
 e ffffd10c3652efa0 fffff801d4e9697e partmgr!PmGlobalDispatch+0x101            
 f ffffd10c3652efd0 fffff801d54f7cb0 nt!IofCallDriver+0xbe                     
10 ffffd10c3652f010 fffff801d54f8e7e nt!IopSynchronousCall+0xf8                
11 ffffd10c3652f080 fffff801d55dd94f nt!IopRemoveDevice+0xe2                   
12 ffffd10c3652f130 fffff801d55dd81e nt!PnpSurpriseRemoveLockedDeviceNode+0xdb 
13 ffffd10c3652f1a0 fffff801d55dd615 nt!PnpDeleteLockedDeviceNode+0x7e         
14 ffffd10c3652f1e0 fffff801d54fa90d nt!PnpDeleteLockedDeviceNodes+0x109       
15 ffffd10c3652f270 fffff801d5506a1b nt!PnpProcessQueryRemoveAndEject+0x4c1    
16 ffffd10c3652f350 fffff801d5612057 nt!PnpProcessTargetDeviceEvent+0xe7       
17 ffffd10c3652f380 fffff801d4f09e32 nt!PnpDeviceEventWorker+0x1e7             
18 ffffd10c3652f400 fffff801d505904a nt!ExpWorkerThread+0x1b2                  
19 ffffd10c3652f5b0 fffff801d52741c4 nt!PspSystemThreadStartup+0x5a            
1a ffffd10c3652f600 0000000000000000 nt!KiStartSystemThread+0x34  

cc: @annie-li

VioScsiUnitControl does not interate through the whole list on
surprise removal, it leads to a hang on LUN removal if there are
some SRBs present in srb_list.

This fix adds interator through the elements of the list.

Fixes: 35331da

Signed-off-by: Sergey Bykov <[email protected]>
@YanVugenfirer
Copy link
Collaborator

Can one of the admins verify this patch?

@YanVugenfirer
Copy link
Collaborator

ok to test

@sb-ntnx sb-ntnx changed the title vioscsi: fix infinite loop on surprise removal if srb_list vioscsi: fix infinite loop on surprise removal if srb_list is not empty Apr 24, 2025
@YanVugenfirer
Copy link
Collaborator

@kostyanf14 - why does “disk verification" still fail?

@sb-ntnx
Copy link
Contributor Author

sb-ntnx commented Apr 28, 2025

I have HLK'ed the change for 2025 and 2022 and have not seen any unexpected issues.

@kostyanf14
Copy link
Member

@kostyanf14 - why does “disk verification" still fail?

Server 2022 is not updated HCK-CI. I rerun only this test

@kostyanf14
Copy link
Member

@kostyanf14 - why does “disk verification" still fail?

Pass

@sb-ntnx
Copy link
Contributor Author

sb-ntnx commented Apr 30, 2025

@vrozenfe, are we good to merge it?

Copy link
Collaborator

@vrozenfe vrozenfe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.
Thank you a lot Sergey.

@YanVugenfirer YanVugenfirer merged commit 9ad6c4c into virtio-win:master May 6, 2025
6 of 8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants