Skip to content

Rework forker usage pattern to avoid Registries#1910

Open
jasagredo wants to merge 4 commits intomainfrom
js/forkers-again
Open

Rework forker usage pattern to avoid Registries#1910
jasagredo wants to merge 4 commits intomainfrom
js/forkers-again

Conversation

@jasagredo
Copy link
Contributor

@jasagredo jasagredo commented Mar 4, 2026

Description

Forkers have been a source of headaches for a very long time already, and all due to the limitation of the Resource registry that enforces registries to only be used from known threads. This has lead to a situation where we had hierarchies of registries just to satisfy that rule and even we had to extend the resource-registry API to allow transferring between registries which is unsafe.

The general pattern followed now is:

  1. At the ledgerdb level we only keep track of the resources, running WithTempRegistry while opening the DB and then closing them when closing the LedgerDB.
  2. Forkers that open new handles are open in a with* combinator style, so the handles are deallocated when exiting the scope (or committed).
  3. Forkers that only duplicate existing handles (read-only forkers) are only used in Mempool and LocalStateQuery and there we put them in a registry, in particular in the top level registry.
  4. The handles are not tracked directly---each handle is tracked indirectly because some forker or the LedgerDB owns it and is tracked. Forkers are tracked so that they can be released promptly which avoids space-leaks and permits the backends (LSM/LMDB) to do their gradual bookkeeping/rearranging work.
  5. In particular, when the node is shutting down, there is no need to close each handle/forker:
    1. in InMemory it doesn't matter as it is pure
    2. in LSM, closing the session will close any open handles, and closing the session happen when shutting down the node,
    3. in LMDB, each forker has one single read-transaction open and it will be closed as described above. There are no further resources allocated on each step. Also closing the backing store is optional as described in the LMDB docs (also mentioned in LedgerDB API).

Sadly, the change is so fundamental that there is no meaningful way of splitting the changes into reviewable commits that compile on their own.

Proposed review order (recommended to disable whitespace-diff in Github diff view and hide already viewed files):

  1. Ouroboros.Consensus.Storage.ChainDB.API
  2. Ouroboros.Consensus.MiniProtocol.LocalStateQuery.Server + Ouroboros.Consensus.Network.NodeToClient
  3. Ouroboros.Consensus.Mempool.{Impl.Common, Update, Init}
  4. Ouroboros.Consensus.Storage.ChainDB.ChainSel
  5. Ouroboros.Consensus.Storage.ChainDB.Impl.Types + Ouroboros.Consensus.Storage.ChainDB.Impl.Query
  6. Ouroboros.Consensus.Storage.ChainDB.Impl
  7. Ouroboros.Consensus.Storage.ImmutableDB.{Impl, Impl.Validation}
  8. Ouroboros.Consensus.Storage.VolatileDB.Impl
  9. Ouroboros.Consensus.Storage.LedgerDB.API (both the API changes and the initialization)
  10. Ouroboros.Consensus.Storage.LedgerDB
  11. Ouroboros.Consensus.Storage.LedgerDB.Forker
  12. Ouroboros.Consensus.Node (just comments changes)
  13. Ouroboros.Consensus.NodeKernel
  14. Ouroboros.Consensus.Storage.LedgerDB.V2.LedgerSeq
  15. Ouroboros.Consensus.Storage.LedgerDB.V2.Forker
  16. Ouroboros.Consensus.Storage.LedgerDB.V2.Backend
  17. Ouroboros.Consensus.Storage.LedgerDB.V2.InMemory
  18. Ouroboros.Consensus.Storage.LedgerDB.V2.LSM
  19. Ouroboros.Consensus.Storage.LedgerDB.V2
  20. Ouroboros.Consensus.Storage.LedgerDB.V1.Forker
  21. Ouroboros.Consensus.Storage.LedgerDB.V1.Snapshots
  22. Ouroboros.Consensus.Storage.LedgerDB.V1

The rest of the changes are just adapting tests or adapting the cardano-tools to use the new access pattern. I did not do any meaningful change in the tests by my will, I only followed GHC errors where they took me.

Complete VolatileDbArgs m blk ->
(forall st. WithTempRegistry st m (VolatileDB m blk, st) -> ans) ->
ans
openDB VolatileDbArgs{volHasFS = SomeHasFS hasFS, ..} cont = cont $ do
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For reviewers: this change removes the need of runInnerWithTempRegistry.

, HasCallStack
) =>
Complete ImmutableDbArgs m blk ->
(forall st. WithTempRegistry st m (ImmutableDB m blk, st) -> ans) ->
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For reviewers: this change removes the need of runInnerWithTempRegistry.

Comment on lines -476 to -489
-- Currently, we have two distinct approaches to resource management
-- and database closure:
--
-- - In the LedgerDB, closing the database does not close any resources
-- created by its clients. We rely on the resource registry to deallocate
-- these resources before the LedgerDB is closed. However, after closing
-- the LedgerDB, the only permitted action on these resources is to free them.
-- See 'ldbForkers'.
--
-- - In the ChainDB, closing the database also closes all followers and
-- iterators.
--
-- TODO: Ideally, the ChainDB and LedgerDB should follow a consistent
-- approach to resource deallocation.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For reviewers: This was not even the case before. But now it is irrelevant

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For reviewers: only change is that creating the resources is WithTempRegistry. The rest is just renaming and re-shaping the values to uniformize both init functions to return a StateRef.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For reviewers: it only modifies the streaming backend functions

@jasagredo jasagredo force-pushed the js/forkers-again branch 2 times, most recently from 4ec580f to e1181b3 Compare March 4, 2026 13:04
Copy link
Contributor

@nfrisby nfrisby left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This Review is the result of my 3 hour pair review with Javier.

cse <- chainSelEnv
chainSelection cse rr (first Diff.extend <$> candidates)
fmap (getSuffix . fst)
<$> chainSelection cse (first Diff.extend <$> candidates) (\_ _ -> join . atomically . forkerCommit)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ChainSel is now accepting a success continuation (ie better chain continuation) because it has a bracket in it for the Forker now.

Maybe it's name should include with* or brackedted* etc.

atomically $ forkerCommit (getLedger newChain)
forkerClose (getLedger newChain)
pure $ getSuffix $ getChainDiff newChain
fromMaybe curChain <$> chainSelection' curChain chains'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Primary changes:

  • ChainSel now never explicitly closes a Forker; every ChainSel forker is opened via bracket.
  • Therefore, the ChainSel success continuation is now passed as an argument.

-- ^ Get a 'HeaderStateHistory' populated with the 'HeaderState's of the
-- last @k@ blocks of the current chain.
, getReadOnlyForkerAtPoint ::
, allocInRegistryReadOnlyForkerAtPoint ::
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since ChainSel is a thread we've been focused on in the past regarding this Forker tracking stuff, I'll explicitly note that ChainSel doesn't use either of these.

(ChainSel relies on validateForker, which directly uses the LedgerDB API to open a read-write Forker.)

lift $ traceWith tracer NoValidLastLocation
mkOpenState hasFS index chunk Origin MustBeNew
traceWith tracer NoValidLastLocation
-- It is OK to disable the temp registry at this point because this
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Notes from our call:

  • At the very least, "disable" is the wrong word. You mean "execute" maybe?
  • But there's also the broader topic we discussed of: if mkOpenState fails, then the node is shutting down, so we don't need to bother using a registry to enforce hClose is called. But that's probably for a follow-up PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A follow-up point: the claim above would not be true if any resources in the temporary registry had a clean-up function that was more interesting than hClose. But we're pretty sure they're all just hClose.

volTracer
volMaxBlocksPerFile
stVar <- lift $ RAWLock.new (DbOpen ost)
-- It is OK to disable the temp registry at this point because this
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same probably applies as my similar GitHub Comment on ImmutableDB.openDB

-- Therefore, the 'BackingStore' is not allocated in any resource or tracked
-- in any way as a resource.
--
-- For the value handles, all the usages of those are bracketed or tracked in
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the call, Javier just explained a key distinction very nicely. It should probably be in this comment.

The LedgerDB retains one LSM handle per block on the volatile suffix. That's the whole idea behind "immutable" databases.

That's not true for LMDB, since in the V1 interface the on-disk state of the database only ever corresponds to whichever immutable block's changelog was flushed most recently.

And a note to self about LMDB/V1:

An LMDB V1 forker (even a read-write Forker!) only retains an open read-only LMDB transaction (which it opens lazily, the first time it handles a query that depends on the UTxO, b/c there's a limit on how many LMDB txs can be open simultaneously). LSQ is the only thing that would hold such a Forker for a long time; Mempool and ChainSel both have one too, but they are updated frequently and/or ephemeral.

ResolveBlock m blk ->
GetVolatileSuffix m blk ->
m (LedgerDB' m blk, Word64)
WithTempRegistry st m (LedgerDB' m blk, Word64)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note from call:

It's crucial that this is scoped within the same runWithTempRegistry call that includes the allocation of the ChainDB itself into the top-level resource registry. That's why the chain of open* functions are in WithTempRegistry even though there's just the one part of them that actually puts stuff in that registry.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants