-
Notifications
You must be signed in to change notification settings - Fork 2.5k
Description
Deadlock Scenario
I've seen a deadlock arise a few times on one of my nodes, primarily on testnet3, presumably triggered more often due to the existence of block storms.
Here's a sample goroutine dump:
goroutine profile: total 21172
11514 @ 0x43beee 0x407625 0x407277 0x8a69f3 0x46e961
# 0x8a69f2 main.(*server).peerDoneHandler+0x52 /root/gocode/src/github.com/btcsuite/btcd/server.go:2211
9563 @ 0x43beee 0x407625 0x407277 0x8b6f75 0x8b6f5c 0x82f9a2 0x82ffd2 0x83013f 0x8308e9 0x46e961
# 0x8b6f74 main.(*server).AddPeer+0x34 /root/gocode/src/github.com/btcsuite/btcd/server.go:2330
# 0x8b6f5b main.(*serverPeer).OnVerAck+0x1b /root/gocode/src/github.com/btcsuite/btcd/server.go:533
# 0x82f9a1 github.com/btcsuite/btcd/peer.(*Peer).processRemoteVerAckMsg+0xa1 /root/gocode/src/github.com/btcsuite/btcd/peer/peer.go:2068
# 0x82ffd1 github.com/btcsuite/btcd/peer.(*Peer).waitToFinishNegotiation+0x1b1 /root/gocode/src/github.com/btcsuite/btcd/peer/peer.go:2195
# 0x83013e github.com/btcsuite/btcd/peer.(*Peer).negotiateInboundProtocol+0x11e /root/gocode/src/github.com/btcsuite/btcd/peer/peer.go:2241
# 0x8308e8 github.com/btcsuite/btcd/peer.(*Peer).start.func1+0x28 /root/gocode/src/github.com/btcsuite/btcd/peer/peer.go:2289
1 @ 0x43beee 0x407625 0x407277 0x8a29b7 0x8a2977 0x8a2a47 0x8562cf 0x77f0cd 0x76eea5 0x76ecc6 0x771630 0x76bb7c 0x77f59e 0x77f993 0x8526ae 0x855c85 0x46e961
# 0x8a29b6 main.(*server).RelayInventory+0xb6 /root/gocode/src/github.com/btcsuite/btcd/server.go:2341
# 0x8a2976 main.(*server).relayTransactions+0x76 /root/gocode/src/github.com/btcsuite/btcd/server.go:1497
# 0x8a2a46 main.(*server).AnnounceNewTransactions+0x26 /root/gocode/src/github.com/btcsuite/btcd/server.go:1508
# 0x8562ce github.com/btcsuite/btcd/netsync.(*SyncManager).handleBlockchainNotification+0x32e /root/gocode/src/github.com/btcsuite/btcd/netsync/manager.go:1483
# 0x77f0cc github.com/btcsuite/btcd/blockchain.(*BlockChain).sendNotification+0xcc /root/gocode/src/github.com/btcsuite/btcd/blockchain/notifications.go:78
# 0x76eea4 github.com/btcsuite/btcd/blockchain.(*BlockChain).connectBlock.func2+0x84 /root/gocode/src/github.com/btcsuite/btcd/blockchain/chain.go:708
# 0x76ecc5 github.com/btcsuite/btcd/blockchain.(*BlockChain).connectBlock+0x3c5 /root/gocode/src/github.com/btcsuite/btcd/blockchain/chain.go:709
# 0x77162f github.com/btcsuite/btcd/blockchain.(*BlockChain).connectBestChain+0x28f /root/gocode/src/github.com/btcsuite/btcd/blockchain/chain.go:1219
# 0x76bb7b github.com/btcsuite/btcd/blockchain.(*BlockChain).maybeAcceptBlock+0x19b /root/gocode/src/github.com/btcsuite/btcd/blockchain/accept.go:79
# 0x77f59d github.com/btcsuite/btcd/blockchain.(*BlockChain).processOrphans+0x27d /root/gocode/src/github.com/btcsuite/btcd/blockchain/process.go:118
# 0x77f992 github.com/btcsuite/btcd/blockchain.(*BlockChain).ProcessBlock+0x332 /root/gocode/src/github.com/btcsuite/btcd/blockchain/process.go:236
# 0x8526ad github.com/btcsuite/btcd/netsync.(*SyncManager).handleBlockMsg+0x2ad /root/gocode/src/github.com/btcsuite/btcd/netsync/manager.go:745
# 0x855c84 github.com/btcsuite/btcd/netsync.(*SyncManager).blockHandler+0x424 /root/gocode/src/github.com/btcsuite/btcd/netsync/manager.go:1365
What we see here is that we had a combination of many peers connecting and disconnecting. When a peer connects, we go to notify the server to update the various book keeping information it stores:
Lines 1783 to 1784 in 67b8efd
// Signal the sync manager this peer is a new sync candidate. | |
s.syncManager.NewPeer(sp.Peer) |
The server then goes to update the SyncManager
, to see if we need to pick this peer as a sync node or not, while also setting up some book keeping information:
Lines 455 to 478 in 67b8efd
// handleNewPeerMsg deals with new peers that have signalled they may | |
// be considered as a sync peer (they have already successfully negotiated). It | |
// also starts syncing if needed. It is invoked from the syncHandler goroutine. | |
func (sm *SyncManager) handleNewPeerMsg(peer *peerpkg.Peer) { | |
// Ignore if in the process of shutting down. | |
if atomic.LoadInt32(&sm.shutdown) != 0 { | |
return | |
} | |
log.Infof("New valid peer %s (%s)", peer, peer.UserAgent()) | |
// Initialize the peer state. | |
isSyncCandidate := sm.isSyncCandidate(peer) | |
sm.peerStates[peer] = &peerSyncState{ | |
syncCandidate: isSyncCandidate, | |
requestedTxns: make(map[chainhash.Hash]struct{}), | |
requestedBlocks: make(map[chainhash.Hash]struct{}), | |
} | |
// Start syncing by choosing the best candidate if needed. | |
if isSyncCandidate && sm.syncPeer == nil { | |
sm.startSync() | |
} | |
} |
While one of these requests is pending, it's possible that the SyncManager
is notified of a new block. When receiving a new block, the sync manager will ask the chain to process it:
Lines 743 to 746 in 67b8efd
// Process the block to include validation, best chain selection, orphan | |
// handling, etc. | |
_, isOrphan, err := sm.chain.ProcessBlock(bmsg.block, behaviorFlags) | |
if err != nil { |
Indirectly during processing, a new event is emitted to notify the server that there's a new block:
Line 1699 in 67b8efd
sm.chain.Subscribe(sm.handleBlockchainNotification) |
Herein lies our deadlock: peer -> server -> sync manager -> server -> peer:
- The peer is waiting on the server to add a new peer.
- The server to add a new peer.
- The sync manager is waiting on the server to notify a new block.
This halts all new incoming peer handling, and also some RPC calls that want to cal into the SyncManager
to query state.
Resolution Paths
One easy path comes to mind: can we just make the call from server to sync manager sync? So:
go s.syncManager.NewPeer(sp.Peer)
The state in the sync manager will be cleaned up once the peerDoneHandler
exits:
Line 2215 in 67b8efd
s.syncManager.DonePeer(sp.Peer) |