LedgerDB V2: handles aren't closed in the common case #1515

amesgen · 2025-05-19T17:27:23Z

The V2 LedgerDB (introduced as part of UTxO HD in #1412) maintains a LedgerSeq at its core:

ouroboros-consensus/ouroboros-consensus/src/ouroboros-consensus/Ouroboros/Consensus/Storage/LedgerDB/V2/LedgerSeq.hs

Lines 140 to 142 in d582af5

    
           newtype LedgerSeq m l = LedgerSeq { 
        
               getLedgerSeq :: AnchoredSeq (WithOrigin SlotNo) (StateRef m l) (StateRef m l) 
        
             } deriving (Generic)

ouroboros-consensus/ouroboros-consensus/src/ouroboros-consensus/Ouroboros/Consensus/Storage/LedgerDB/V2/LedgerSeq.hs

Lines 119 to 122 in d582af5

    
           data StateRef m l = StateRef { 
        
               state  :: !(l EmptyMK) 
        
             , tables :: !(LedgerTablesHandle m l) 
        
             } deriving (Generic)

Now, when we prune a LedgerSeq, we want to close the LedgerTablesHandles from the prefix that we are removing. The current code tries to do this, it it is not effective. Namely, prune returns the LedgerSeq prefix containing states to prune:

ouroboros-consensus/ouroboros-consensus/src/ouroboros-consensus/Ouroboros/Consensus/Storage/LedgerDB/V2/LedgerSeq.hs

Lines 230 to 238 in d582af5

    
           prune (LedgerDbPruneKeeping (SecurityParam k)) (LedgerSeq ldb) = 
        
               if toEnum nvol <= unNonZero k 
        
               then (LedgerSeq $ Empty (AS.anchor ldb), LedgerSeq ldb) 
        
               else 
        
                 -- We remove the new anchor from the @fst@ component so that its handle is 
        
                 -- not closed. 
        
                 B.bimap (LedgerSeq . dropNewest 1) LedgerSeq $ AS.splitAt (nvol - fromEnum (unNonZero k)) ldb 
        
             where 
        
               nvol = AS.length ldb

And then in other places, we try to close the handles:

ouroboros-consensus/ouroboros-consensus/src/ouroboros-consensus/Ouroboros/Consensus/Storage/LedgerDB/V2/Forker.hs

Lines 154 to 155 in d582af5

    
           let (l, s) = prune (LedgerDbPruneKeeping (foeSecurityParam env)) (LedgerSeq newdb) 
        
           pure ((toClose, l), s)

ouroboros-consensus/ouroboros-consensus/src/ouroboros-consensus/Ouroboros/Consensus/Storage/LedgerDB/V2/Forker.hs

Line 163 in d582af5

    
           mapM_ (close . tables) $ AS.toOldestFirst discardedBySelection ++ AS.toOldestFirst discardedByPruning

ouroboros-consensus/ouroboros-consensus/src/ouroboros-consensus/Ouroboros/Consensus/Storage/LedgerDB/V2/LedgerSeq.hs

Lines 186 to 187 in 2c06471

    
           closeLedgerSeq :: Monad m => LedgerSeq m l -> m () 
        
           closeLedgerSeq = mapM_ (close . tables) . toOldestFirst . getLedgerSeq

The crucial observation is that toOldestFirst does not return the anchor of the LedgerSeq/AnchoredSeq, so we do not close this. In particular, in the very common case of adopting one block (such that the immutable tip also advances by one block), which is the common case both while syncing and caught-up, we do not close anything.

The fix is simply to also consider the anchor of the sequence of handles to close.

Currently, this isn't problematic as the only V2 backend is the in-memory backend at the moment, and for it, closing handles doesn't matter (as the GC will still claim the ledger state once it is no longer referenced). However, in the upcoming LSM tree backend, this will be crucial.

Concrete "demonstration" of the bug:

Apply this diff on d582af5 (main at time of writing):

Show diff

diff --git a/ouroboros-consensus/src/ouroboros-consensus/Ouroboros/Consensus/Storage/LedgerDB/V2/InMemory.hs b/ouroboros-consensus/src/ouroboros-consensus/Ouroboros/Consensus/Storage/LedgerDB/V2/InMemory.hs
index 770a85862d..c4e2ac3115 100644
--- a/ouroboros-consensus/src/ouroboros-consensus/Ouroboros/Consensus/Storage/LedgerDB/V2/InMemory.hs
+++ b/ouroboros-consensus/src/ouroboros-consensus/Ouroboros/Consensus/Storage/LedgerDB/V2/InMemory.hs
@@ -90,7 +90,8 @@ newInMemoryLedgerTablesHandle ::
 newInMemoryLedgerTablesHandle someFS@(SomeHasFS hasFS) l = do
   !tv <- newTVarIO (LedgerTablesHandleOpen l)
   pure LedgerTablesHandle {
-      close =
+      close = do
+        !_ <- error "foo"
         atomically $ writeTVar tv LedgerTablesHandleClosed
     , duplicate = do
         hs <- readTVarIO tv
diff --git a/ouroboros-consensus/test/storage-test/Test/Ouroboros/Storage/LedgerDB/StateMachine.hs b/ouroboros-consensus/test/storage-test/Test/Ouroboros/Storage/LedgerDB/StateMachine.hs
index 5015d44106..33f842b615 100644
--- a/ouroboros-consensus/test/storage-test/Test/Ouroboros/Storage/LedgerDB/StateMachine.hs
+++ b/ouroboros-consensus/test/storage-test/Test/Ouroboros/Storage/LedgerDB/StateMachine.hs
@@ -270,14 +270,14 @@ instance StateModel Model where
   arbitraryAction _ model@(Model chain secParam) =
     frequency $ [ (2, pure $ Some GetState)
                 , (2, pure $ Some ForceTakeSnapshot)
-                , (1, Some . DropAndRestore <$> QC.choose (0, fromIntegral $ AS.length chain))
+                , (0, Some . DropAndRestore <$> QC.choose (0, fromIntegral $ AS.length chain))
                 , (4, Some <$> do
                       let maxRollback =
                             min
                               (fromIntegral . AS.length $ chain)
                               (BT.unNonZero $ maxRollbacks secParam)
-                      numRollback  <- QC.choose (0, maxRollback)
-                      numNewBlocks <- QC.choose (numRollback, numRollback + 2)
+                      numRollback  <- pure 0
+                      numNewBlocks <- pure 1
                       let
                         chain' = case modelRollback numRollback model of
                           UnInit     -> error "Impossible"

Then, running cabal run storage-test -- -p InMemV2 succeeds, even though we would expect it to fail if we would actually close the handles.

Concretely, this patch

modifies the close logic of the LedgerTablesHandle to always fail with an imprecise exception, and
modifies the LedgerDB to only generate ValidateAndCommit commands that add exactly one block, and disables DropAndRestore.

Then, if V2 were correctly closing pruned LedgerTablesHandles, then we would expect the test to fail; however, it passes just fine. Note that it does fail when keeping the test unmodified, showing that we at least sometimes close LedgerTablesHandles.

The text was updated successfully, but these errors were encountered:

Closes #1515 The first three commits add a regression test by enrichting the existing LedgerDB state machine test, making sure that ``` total number of all opened handles - total number of closed handles = states on the LedgerSeq in the end ``` The fourth commit then fixes the bug, causing the regression test to succeed. Note that this bug fix reveals another bug #1551 (causing the `mock-test` failure for `LeaderSchedule.simple convergence`), which we silence by temporarily making the closing of an in-memory handle a no-op (as it is not *necessary* anyways, the GC will still collect the in-memory ledger state).

amesgen self-assigned this May 19, 2025

amesgen added this to Consensus Team Backlog May 19, 2025

amesgen moved this to 👀 In review in Consensus Team Backlog May 19, 2025

amesgen removed the status in Consensus Team Backlog May 19, 2025

amesgen moved this to 👀 In review in Consensus Team Backlog May 19, 2025

amesgen mentioned this issue May 19, 2025

LedgerDB.V2: make sure to actually close handles #1516

Merged

jasagredo mentioned this issue May 29, 2025

Consensus release for node 10.6 #1541

Closed

amesgen closed this as completed in #1516 Jun 5, 2025

github-project-automation bot moved this from 👀 In review to ✅ Done in Consensus Team Backlog Jun 5, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

LedgerDB V2: handles aren't closed in the common case #1515

LedgerDB V2: handles aren't closed in the common case #1515

amesgen commented May 19, 2025 •

edited

Loading

LedgerDB V2: handles aren't closed in the common case #1515

LedgerDB V2: handles aren't closed in the common case #1515

Comments

amesgen commented May 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

amesgen commented May 19, 2025 •

edited

Loading