Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IPFS GC & Lotus Splitstore #129

Open
Stebalien opened this issue Jun 29, 2021 · 4 comments
Open

IPFS GC & Lotus Splitstore #129

Stebalien opened this issue Jun 29, 2021 · 4 comments

Comments

@Stebalien
Copy link
Member

See #120 & #8

NOTE: I'm discussing long-term solutions here, not short-term. Unfortunately, we'll likely have to go with a special-purpose solution due to time constraints, for now.

The solution-space for GC in the lotus splitstore is very similar to IPFS pinning/GC:

  1. We need a way open some form of "transaction" where anything touched in the transaction isn't garbage collected till we've had a chance to "pin" it.
  2. We need a way to unpin old tipsets.

Differences include:

  1. In the splitstore, we want to move unreferenced blocks instead of just deleting them.
  2. Lotus needs the ability to pin IPLD selectors (i.e., pin a tipset without pinning parents/sectors) while this is only a "nice to have" in go-ipfs.
  3. The latest go-ipfs GC/pinning proposal would have written a mapping for every pin/block pair, which would be way to slow to pin new tipsets. Prior IPFS GC proposals avoided this issue by assuming that the children of pinned blocks were already pinned (recursive pins). But that assumption doesn't hold with arbitrary selectors.

Difference 3 is the hardest one to reconcile, but not impossible, and go-ipfs would benefit significantly from such an optimization.

To do this generically, we'd need to track how many pinned parent blocks (or direct pins) reference a given block and it's children via a specific selector. Prior IPFS GC proposals left off the "via a specific selector" part.

@Stebalien
Copy link
Member Author

(not sure where else to put this)

@Stebalien
Copy link
Member Author

cc @vyzo, @raulk, & @gammazero.

I don't see this being the short-term solution given the complexity in "difference 3" (unless someone can think of a simple solution to that), but we should keep this in mind when designing the splitstore. Ideally the go-ipfs and lotus GC solutions would eventually converge.

@aschmahmann
Copy link
Contributor

aschmahmann commented Jun 29, 2021

Some brief thoughts from conversations I've had recently:

Lotus needs the ability to pin IPLD selectors (i.e., pin a tipset without pinning parents/sectors) while this is only a "nice to have" in go-ipfs.

@warpfork recently was talking to me about a request from Ceramic for this type of behavior in go-ipfs.

Prior IPFS GC proposals avoided this issue by assuming that the children of pinned blocks were already pinned (recursive pins). But that assumption doesn't hold with arbitrary selectors.

The current mark + sweep GC would handle this just fine, although it'd still suffer from our existing problems. It might be worth noting that Peergos has been using their own version of mark + sweep that IIUC leverages transactions (note: both go-ds-leveldb and badger support transactions however they're currently unused within go-ipfs) to allow more parallelism (i.e. by tracking which blocks are in open transactions we know what can be avoided during GC) and side-step the global lock.

This means GC can still take a while if you have a lot of blocks, you still have to read + process all you pins every so often, and you still can't immediately remove a block, but you avoid the biggest pain in the process which is the inability to operate during GC.

@Stebalien
Copy link
Member Author

The current mark + sweep GC would handle this just fine, although it'd still suffer from our existing problems. It might be worth noting that Peergos has been using their own version of mark + sweep that IIUC leverages transactions (note: both go-ds-leveldb and badger support transactions however they're currently unused within go-ipfs) to allow more parallelism (i.e. by tracking which blocks are in open transactions we know what can be avoided during GC) and side-step the global lock.

This is basically how the splitstore currently works.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants