Expiry per dataset #678

tbekas · 2024-01-17T14:22:25Z

No description provided.

dryajov · 2024-02-07T16:09:01Z

README.md

+ -t, --default-ttl          Default dataset expiry in seconds [=$DefaultDefaultExpiry].
+     --maint-interval       Maintenance interval in seconds - determines frequency of maintenance cycle:
+                            how often datasets are checked for expiration and cleanup. Value 0 disables the
+                            maintenance [=$DefaultMaintenanceInterval].


Don't seem like it's resolving the default here

You're right. I will change it to the literals 86400 (24 hours) for ttl and 300 for maintenance interval (5 minutes).

I think this used to work fine (in fact previous versions of codex do resolve this fine), however, I'm seeing the same in #700. Looks like Duration isn't properly stringified.

dryajov · 2024-02-10T18:07:01Z

README.md

-                            often blocks are checked for expiration and cleanup
-                            [=$DefaultBlockMaintenanceInterval].
-     --block-mn             Number of blocks to check every maintenance cycle [=1000].
+ -t, --default-ttl          Default dataset expiry in seconds [=86400].


As mentioned, this should actually work with the constants, which is preferable and it appears like it broke some time ago, so a better solution would be to leave the constants in for now and figure out why they aren't resolving. We don't have to hold this pr because of it however.

dryajov · 2024-02-10T18:10:07Z

codex/node.nim

@@ -233,6 +224,13 @@ proc retrieve*(

    # Retrieve all blocks of the dataset sequentially from the local store or network
    trace "Creating store stream for manifest", cid
+


Why do we need this here, is it to prevent retrieving expired datasets?

No, whenever we store a new dataset we need to explicitly call trackExpiry so that all blocks within that dataset will get maintained.

dryajov

Lets get this merged with safe-block-deletion and I'll give it a more thorough review. There lots of changes across main and this two branches which makes reviewing this separately a bit hard.

benbierens

Docker image of this branch seems to be passing basic dist-tests.

benbierens · 2024-06-19T06:34:46Z

codex/codex.nim

@@ -246,6 +246,9 @@ proc new*(
    wallet = WalletRef.new(EthPrivateKey.random())
    network = BlockExcNetwork.new(switch)

+    metaDs = SQLiteDatastore.new(config.dataDir / CodexMetaNamespace)


metaDs seems to be defined again on line 262, but then as a LevelDbDs.

Thanks for catching it 👌 We should be using a single metaDb

benbierens · 2024-06-19T06:36:23Z

codex/merkletree/codex/coders.nim

+
+  CodexProof.decode(bytes)
+
+func `%`*(proof: CodexProof): JsonNode = % byteutils.toHex(proof.encode())


I don't see how the changes in this file are connected to the expiry per dataset. A quick explain will do! :D

This whole change includes changing of the type of quota usage (used, reserved and available bytes) from simple uint to NBytes. Then to avoid converting it to uint everywhere I changed whenever it was suitable also to NBytes, that includes the data model used for REST endpoint, the RestRepoStore object type. And since it's need to be serialized properly on the endpoint we need such encoder.

benbierens · 2024-06-19T06:58:45Z

codex/stores/maintenance.nim

+    if err =? (await self.recordCheckpoint(treeCid, datasetMd)).errorOption:
+      return failure(err)
+
+    return success()


On finishing a successful delete of a dataset, should we delete that dataset's entry in the dataset-metadata datastore? Is this already handled somewhere? or is there a reason not to?

Yeah, we do it when recording a checkpoint - we check if progress reached 100% and if so we remove the dataset metatada. Line 198.

benbierens · 2024-06-19T07:01:40Z

codex/stores/maintenance.nim

+    return success()
+  else:
+    datasetMd.checkpoint.progress = index
+    return await self.deleteBatch(treeCid, datasetMd)


Should this call itself, does this not risk huge callstacks for large datasets? Might we not instead 'simply' wait for the next cycle of superviseDatasetDeletion to delete more blocks?

No, recursion is trampolined in Future. Basically you have infinite™️ callstack with {.async.}

benbierens · 2024-06-19T07:04:53Z

codex/stores/repostore.nim

-    quotaMaxBytes: quotaMaxBytes,
-    blockTtl: blockTtl
-  )
+export store, types, coders


Splitting stuff up?! I like it. 👍

emizzle · 2024-06-21T04:23:50Z

codex/stores/maintenance.nim

+    except Exception as exc:
+      error "Unexpected error during maintenance", msg = exc.msg


Exception should not be caught, because Defect is not catchable which is a derived type of Exception. Instead, use what was there previously:

except CancelledError as error: raise error except CatchableError as exc:

Also, there is a combination of exceptions being caught and errored Results being handled, which indicates there are some exceptions leaking in the underlying context, when they should be returned as an errored Result.

Ideally we should mark all of the routines that return a Result in the underlying context with {.raises:[].}. When the chronos v4 changes go in, we can also mark the async procs with {.async: (raises:[]).}

emizzle · 2024-06-21T04:56:42Z

codex/stores/maintenance.nim

-    self.offset = 0
+    if (datasetMd.expiry < self.clock.now) and
+        (datasetMd.checkpoint.timestamp + self.retryDelay.seconds < self.clock.now):
+      asyncSpawn self.superviseDatasetDeletion(treeCid, datasetMd)


Possibly we should track these futures using TrackedFutures so that we can
successfully cancel them on stop:

DatasetMaintainer* = object trackedFutures: TrackedFutures proc new*( T: type DatasetMaintainer, # ... ) = DatasetMaintainer( #... trackedFutures = TrackedFutures.new() #... ) # Usage: proc checkDatasets(self: DatasetMaintainer): Future[?!void] {.async.} = # ... discard self.superviseDatasetDeletion(treeCid, datasetMd).track(self) # ... proc stop*(self: DatasetMaintainer): Future[void] {.async.} = await self.trackedFutures.cancelTracked()

Good idea, adding it 👍

emizzle · 2024-06-21T06:12:20Z

codex/stores/maintenance.nim

+  await modify[DatasetMetadata](self.metaDs, key,
+    proc (maybeCurrDatasetMd: ?DatasetMetadata): Future[?DatasetMetadata] {.async.} =
+      if currDatasetMd =? maybeCurrDatasetMd:
+        let datasetMd = DatasetMetadata(
+          expiry: max(currDatasetMd.expiry, minExpiry),
+          leavesCount: currDatasetMd.leavesCount,
+          manifestsCids: currDatasetMd.manifestsCids,
+          checkpoint: currDatasetMd.checkpoint
+        )
+        return datasetMd.some
+      else:
+        raise newException(CatchableError, "DatasetMetadata for treeCid " & $treeCid & " not found")
+  )


This is a bit more readable (for me, at least)

Suggested change

await modify[DatasetMetadata](self.metaDs, key,

proc (maybeCurrDatasetMd: ?DatasetMetadata): Future[?DatasetMetadata] {.async.} =

if currDatasetMd =? maybeCurrDatasetMd:

let datasetMd = DatasetMetadata(

expiry: max(currDatasetMd.expiry, minExpiry),

leavesCount: currDatasetMd.leavesCount,

manifestsCids: currDatasetMd.manifestsCids,

checkpoint: currDatasetMd.checkpoint

)

return datasetMd.some

else:

raise newException(CatchableError, "DatasetMetadata for treeCid " & $treeCid & " not found")

)

proc modifyData(maybeCurrDatasetMd: ?DatasetMetadata): Future[?DatasetMetadata] {.async.} =

without currDatasetMd =? maybeCurrDatasetMd:

raise newException(CatchableError, "DatasetMetadata for treeCid " & $treeCid & " not found")

let datasetMd = DatasetMetadata(

expiry: max(currDatasetMd.expiry, minExpiry),

leavesCount: currDatasetMd.leavesCount,

manifestsCids: currDatasetMd.manifestsCids,

checkpoint: currDatasetMd.checkpoint

)

return datasetMd.some

await modify[DatasetMetadata](self.metaDs, key, modifyData)

I still don't think we should be raising exceptions here, because the underlying
implementations (defaultModifyImpl and defaultModifyGetImpl) simply try/except these and turn them into Results.

If we want a modify operation to be stopped, raising an exception is the only way to do that. In this case we want to stop it. The rest of the flow goes as expected, exception gets turned into result and everything gets eventually logged on an error level.

As for the first part of the comment I can extract that anonymous proc into a named proc if you want, however I don't see how it automatically becomes more readable this way.

If we want a modify operation to be stopped, raising an exception is the only way to do that. In this case we want to stop it.

You say this because of the contract for modifyGet, right? Cause there is currently no provisioning there for a modify operation to be aborted?

I suppose returning maybeCurrDatasetMd would be equivalent to a NOP, but sort of inefficient as it would still trigger a write to the underlying store?

If we want a modify operation to be stopped, raising an exception is the only
way to do that

With Dmitriy's suggested change, returning a Result will be the way to stop an
operation, which is exactly what I had in mind.

I can extract that anonymous proc into a named proc if you want, however I
don't see how it automatically becomes more readable this way.

Understood. As an outside reader, I thought perhaps you might want to know what
is considered subjectively "more readable" for that reader.

@gmega

You say this because of the contract for modifyGet, right? Cause there is currently no provisioning there for a modify operation to be aborted?

It's not mention in the docs, but in the signature, we return Future which implies it can be a failure.

I suppose returning maybeCurrDatasetMd would be equivalent to a NOP, but sort of inefficient as it would still trigger a write to the underlying store?

Yep, returning the original argument is NOP. So the state in datastore is essentially the same as raising error. The difference is that when error is raised from a closure, that error is propagated to the caller of modifyGet as failure(err). Probably it should be documented.

@emizzle

With Dmitriy's suggested change, returning a Result will be the way to stop an
operation, which is exactly what I had in mind.

Future already communicates the error. With Future[Result[T]] it becomes ambiguous where the error will be.

Also not sure why are we even talking about it. I'm not changing anything there.

codex/stores/maintenance.nim

emizzle · 2024-06-24T06:50:24Z

codex/stores/maintenance.nim

+    if err =? (await self.recordCheckpoint(treeCid, datasetMd)).errorOption:
+      return failure(err)


Why not simply delete the checkpoint here instead of including the delete logic
in the modify callback?

Most of the time we're updating the checkpoint here. Deletion happens conditionally only as a last step of the process (recording 100% progress is equivalent to deleting the checkpoint along with dataset metadata). So answering the question we're not deleting because that would yield incorrect maintenance results (we would stop deleting blocks after the first batch and leave all the other blocks as garbage that will possibly never be deleted).

answering the question we're not deleting because that would yield incorrect maintenance results (we would stop deleting blocks after the first batch and leave all the other blocks as garbage that will possibly never be deleted)

I'm not following this, can you elaborate?

Datasets are not deleted at once, but in increments of size batchSize. Because the dataset may actually be a lot larger than the batch size, we are forced to store a deletion "cursor" (the checkpoint) so that the maintainer picks up from where it left off during the next maintenance cycle. For instance, a dataset with $10\ 000$ blocks will require $10$ maintenance cycles to be deleted (assuming the default batch size of $1\ 000$ blocks), and the checkpoint will only be deleted after the last cycle.

I think this is perhaps more complicated than it needs to be. We should talk about whether or not maintaining fixed batch sizes really make sense, cause I think being able to kill a dataset at once would simplify things. I'm also advocating limiting concurrency by locking datasets that are undergoing garbage collection so that any operation on the dataset gets forcefully reordered wrt ongoing deletion.

OK looks like I haven't fully understood why we need checkpoints - the code tries to delete the entire dataset, updating the checkpoint at every batch. There is no actual interruption unless that's coming from outside, so not sure why this is needed.

Checkpoints are optimization for storing cursor in case of some interruption like node shutdown. We could resume then from where we left (which may be useful for very large datasets). And yes, we try to delete all dataset blocks one after another.

emizzle · 2024-06-24T07:07:11Z

codex/stores/maintenance.nim

+  await self.metaDs.modify(key,
+    proc (maybeCurrDatasetMd: ?DatasetMetadata): Future[?DatasetMetadata] {.async.} =
+      if currDatasetMd =? maybeCurrDatasetMd:
+        if currDatasetMd.expiry != datasetMd.expiry or currDatasetMd.manifestsCids != datasetMd.manifestsCids:
+          raise newException(CatchableError, "Change in expiry detected, interrupting maintenance for dataset with treeCid " & $treeCid)
+
+        if currDatasetMd.checkpoint.progress > datasetMd.checkpoint.progress:
+          raise newException(CatchableError, "Progress should be increasing only, treeCid " & $treeCid)
+
+        if currDatasetMd.leavesCount <= datasetMd.checkpoint.progress:
+          DatasetMetadata.none
+        else:
+          datasetMd.some
+      else:
+        raise newException(CatchableError, "Metadata for dataset with treeCid " & $treeCid & " not found")


This is more readable imo:

Suggested change

await self.metaDs.modify(key,

proc (maybeCurrDatasetMd: ?DatasetMetadata): Future[?DatasetMetadata] {.async.} =

if currDatasetMd =? maybeCurrDatasetMd:

if currDatasetMd.expiry != datasetMd.expiry or currDatasetMd.manifestsCids != datasetMd.manifestsCids:

raise newException(CatchableError, "Change in expiry detected, interrupting maintenance for dataset with treeCid " & $treeCid)

if currDatasetMd.checkpoint.progress > datasetMd.checkpoint.progress:

raise newException(CatchableError, "Progress should be increasing only, treeCid " & $treeCid)

if currDatasetMd.leavesCount <= datasetMd.checkpoint.progress:

DatasetMetadata.none

else:

datasetMd.some

else:

raise newException(CatchableError, "Metadata for dataset with treeCid " & $treeCid & " not found")

proc modifyData(maybeCurrDatasetMd: ?DatasetMetadata): Future[?DatasetMetadata] {.async.} =

without currDatasetMd =? maybeCurrDatasetMd:

raise newException(CatchableError, "Metadata for dataset with treeCid " & $treeCid & " not found")

if currDatasetMd.expiry != datasetMd.expiry or currDatasetMd.manifestsCids != datasetMd.manifestsCids:

raise newException(CatchableError, "Change in expiry detected, interrupting maintenance for dataset with treeCid " & $treeCid)

if currDatasetMd.checkpoint.progress > datasetMd.checkpoint.progress:

raise newException(CatchableError, "Progress should be increasing only, treeCid " & $treeCid)

if currDatasetMd.leavesCount <= datasetMd.checkpoint.progress:

DatasetMetadata.none

else:

datasetMd.some

await self.metaDs.modify(key, modifyData)

However, I also have a few comments:

Returning DatasetMetadata.none seems like an odd way to indicate that the
metadata should be deleted. Maybe it might be better to handle the delete
logic later on when cleaning up?

Why are we raising an exception for metadata not found here? It seems like it
would be better placed for nim-datastore to handle that, and this
predicate/callback would not be called because a failure would have been
returned further up the call stack.

This callback is try/excepted up the callstack and turned into a Result
which eventually becomes the return value of modify. If we were to change
the signature of this callback to return a Result instead of raising
exceptions, then we know that returning failure(err) in the callback will
become the returned value of modify and hence would be a lot easier to swallow
as a reader.

Since these Results are ultimately bubbled up to superviseDatasetDeletion,
it's a good idea to type them properly so that they can be inspected and
different failures can have differnt outcomes. For example, the "change in
expiry" that is being checked sounds like it would be a nasty bug of unknown
origin, so you may want to add a metric to it so you can monitor the
occurences better.

Keep in mind that it has to be done in a concurrent safe way, otherwise some anomalies will occur in the datastore, like deleting still used data or not deleting expired data. modify is the only way to perform concurrent safe updates to the records and it requires to return none if delete is an intended result of the operation.

An exception is raised because it would be an error situation when we're trying to record a checkpoint when there's no DatasetMetadata that's related to the given treeCid. If you would like to see API for modify changed, please create an issue or maybe a PR in nim-datastore that would explain in detail how such API would look like.

This PR uses existing API in nim-datastore. If you would like this API to be changed please raise such issue with detailed explanation in appropriate repo.

No, change in expiry would not mean a nasty bug. It would mean that there was an update to the dataset metadata, that could be a result of for example reuploading a dataset just after it expired and maintenance already started but hasn't finished yet.

Yeah... I'm seriously wondering if we shouldn't limit concurrency so we can make this easier to reason about. But let's talk about it.

I believe @dryajov's comment address points 1-3. Regarding point 4, I think you
missed the point. The comment was meant to provide reasoning about why typing
exceptions is important. But as with the change that Dmitriy suggested, a Result
would be returned and exceptions would not need to be raised here.

Once this API proposed by @dryajov will be available in nim-datastore I will use it. Even though I think it's pretty much the same as the current one with additional unnecessary complications to it.

tbekas · 2024-06-24T10:20:35Z

@emizzle please add a comment here whenever you will finish reviewing this PR.

dryajov

Overall, I think this looks sound, but I can see that several aspects of the code, (e.g. error handling and overall style) have caused some controversy.

I think there are several reasons for this:

the inherent style of the read-modify-write pattern, like the complex return state and the use of closures, which can be hard to read and reason about
the way we're doing error handling, and overall handling return values

Seeing this in parcatice, I do see the issues with the design of modifyGet. For example, relying on optional to signal what operation to execute (update, delete or keep), limits error handling to only Exceptions. This isn't inherently wrong per se, but it makes the code harder to reason about and less consistent.

One way to address this would be to change modifyGet to be a bit more user friendly. For example by introducing a special return type structure that captures the semantics of the operations more closely, instead of relying on Option, which then opens the possibility of using Result to communicate error states more consistently.

Here is some pseudocode to better illustrate the idea and mull over the different possibilities.

type
  Operations = enum
    Keep,
    Update,
    Delete

  OpResult[T] = object
    case op: Operations
    of Keep: discard
    of Update:
      val: T
    of Delete: discard

  Function*[T, U] = proc(value: T): U {.raises: [CatchableError], gcsafe, closure.}
  ModifyGet* = Function[?!OpResult[seq[bytes]]]]

method modifyGet*(self: Datastore, key: Key, fn: ModifyGet): Future[?!seq[byte]] {.base, locks: "unknown".} =
  let maybeCurrentData = ... # get current data val
  whitout op =? fn(maybeCurrentData), err:
    return failure(...)

  case op.op:
  of Keep: ... # keep val
  of Update: doUpdate(key, op.val) # update entry
  of Delete: doDelete(key)         # delete entry

  success(...)

This makes it easier to reason about the implementation of the update fn, and allows for more consistent error handling style.

Given that this is a limitation of the underlying datastore, and not so much this code, I think we can move this PR forward, provided we address some of the other comments left by the other reviewers and myself. However, I would strongly suggest that we do think about improving modifyGet further and make the required changes asap.

codex/stores/maintenance.nim

dryajov · 2024-07-29T19:10:17Z

codex/stores/maintenance.nim

+    if err =? (await self.recordCheckpoint(treeCid, datasetMd)).errorOption:
+      return failure(err)


answering the question we're not deleting because that would yield incorrect maintenance results (we would stop deleting blocks after the first batch and leave all the other blocks as garbage that will possibly never be deleted)

I'm not following this, can you elaborate?

codex/stores/maintenance.nim

dryajov · 2024-07-29T19:13:59Z

codex/stores/maintenance.nim

+  without queryKey =? createDatasetMetadataQueryKey(), err:
+    return failure(err)
+
+  without queryIter =? await query[DatasetMetadata](self.metaDs, Query.init(queryKey)), err:


How big is this query going to be, we should be careful and perhaps use pagination if this gets too large.

It depends on how many datasets are we storing. I would say that number probably will not ever exceed 1k in practice. But if you want I can add pagination.

dryajov · 2024-07-29T19:17:06Z

codex/stores/maintenance.nim


-  self.timer.start(onTimer, self.interval)
+  if self.interval.seconds > 0:
+    self.timer.start(onTimer, self.interval)


I would avoid using the timer altogether and just use a regular async for here?

What you mean is this?

while (self.notStopped): doStuff() await asyncSleep(self.delay)

For now I left the timer since I didn't get the confirmation if what I proposed above is what you mean.

dryajov · 2024-07-29T20:00:04Z

One more comment, I think moving the closures to their own named functions does improve readability somewhat, but it is really marginal, so I'm either way on this and consider it more a matter of style that anything more substantive.

emizzle

I agree with the suggested change from @dryajov above, and believe this would clear up quite a lot of the readability and reasoning difficulties I experienced as an outside reader.

I want to reiterate my review motivation throughout this PR, so there is no ambiguity. When I write code, I'm very much motivated by these two things:

It is readable
It is easy to reason about

The main thing is that as a writer, consideration of these points means you are always trying to see what you've written from a reader's perspective. Why is this important? Because if code is not readable and is not easy to reason about, it becomes, to a degree, technical debt.

I understand that these two goals are highly subjective, but what I find valuable as a writer is when a reader provides feedback about those points. I consider myself an "outside reader" of this code since I'm less involved in the client on a daily basis, and all of the suggestions I've made are simply my subjective opinion on how to potentially improve those two things. It doesn't mean I think I'm "right" about any of it, because if there's anything I've learned, being right about everything is only good for digging ditches 😂

A positive takeaway from this PR is that we all are very passionate about our quality of work and Codex itself and that is something we should all be happy about ❤️

gmega

My main concerns are with all the concurrency issues I think we may still face with the absence of dataset-level locking, as well as a number of other places where I see we're open to more complexity due to support of transactional updates within our datastore itself.

README.md

codex/codex.nim

codex/conf.nim

gmega · 2024-07-29T12:55:11Z

codex/stores/maintenance.nim

-  DefaultNumberOfBlocksToMaintainPerInterval* = 1000
+  DefaultDefaultExpiry* = 24.hours
+  DefaultMaintenanceInterval* = 5.minutes
+  DefaultBatchSize* = 1000


Hm... as I am reading through this, I am confused by the meaning of batch size in this context. It used to be the number of blocks I'm willing to go through at every garbage collection cycle, but now that expiration is per dataset I can't understand what this means anymore (am I willing to look into 1000 datasets per cycle? That looks too much. Is it still the total number of blocks? Then it's weird, cause this means a dataset can be partially purged if it has more blocks than that). I'll eventually figure it out by reading code, but this is no longer self-explanatory.

BatchSize means number of blocks deleted before recording a checkpoint. If you want I can either add docs here or remove this param altogether with removing checkpoints and replace it using bisect.

Renaming to CheckpointLength and adding docs.

gmega · 2024-07-29T21:11:12Z

codex/stores/maintenance.nim

+    if err =? (await self.recordCheckpoint(treeCid, datasetMd)).errorOption:
+      return failure(err)


Datasets are not deleted at once, but in increments of size batchSize. Because the dataset may actually be a lot larger than the batch size, we are forced to store a deletion "cursor" (the checkpoint) so that the maintainer picks up from where it left off during the next maintenance cycle. For instance, a dataset with $10\ 000$ blocks will require $10$ maintenance cycles to be deleted (assuming the default batch size of $1\ 000$ blocks), and the checkpoint will only be deleted after the last cycle.

I think this is perhaps more complicated than it needs to be. We should talk about whether or not maintaining fixed batch sizes really make sense, cause I think being able to kill a dataset at once would simplify things. I'm also advocating limiting concurrency by locking datasets that are undergoing garbage collection so that any operation on the dataset gets forcefully reordered wrt ongoing deletion.

codex/stores/maintenance.nim

gmega · 2024-07-30T18:25:35Z

codex/stores/maintenance.nim

+    if err =? (await self.recordCheckpoint(treeCid, datasetMd)).errorOption:
+      return failure(err)


OK looks like I haven't fully understood why we need checkpoints - the code tries to delete the entire dataset, updating the checkpoint at every batch. There is no actual interruption unless that's coming from outside, so not sure why this is needed.

gmega · 2024-07-30T18:34:10Z

codex/stores/maintenance.nim

-  if numberReceived < self.numberOfBlocksPerInterval:
-    self.offset = 0
+    if (datasetMd.expiry < self.clock.now) and
+        (datasetMd.checkpoint.timestamp + self.retryDelay.seconds < self.clock.now):


Yeah OK I really need to understand this better:

when do we expect deletes to fail;

why is retrying a strategy (assumes a transient condition).

There can be multiple reasons for that. I.e. fail to open db file due to too many file descriptors open.

Because pt. 1. can happen or node can shutdown mid point dataset deletion.

gmega · 2024-07-30T19:04:00Z

codex/stores/repostore/operations.nim

        res = StoreResult(kind: Stored, used: blk.data.len.NBytes)
        if err =? (await self.repoDs.put(blkKey, blk.data)).errorOption:
          raise err

      (md.some, res)
  )

-proc tryDeleteBlock*(self: RepoStore, cid: Cid, expiryLimit = SecondsSince1970.low): Future[?!DeleteResult] {.async.} =
+proc tryDeleteBlock*(self: RepoStore, cid: Cid): Future[?!DeleteResult] {.async.} =
  if cid.isEmpty:


The fact that metadata and block state are not updated transactionally makes me a bit anxious. I can see for instance that there is room for metadata to be left behind after a block deletion.

Not sure how to address this.

The remainder of the points I raised can be handled with a separate PR in nim-datastore

benbierens

sha-0ab4b1c branch passes basic dist-tests.

tbekas changed the title ~~Expiry per dataset WIP~~ WIP Expiry per dataset Jan 17, 2024

tbekas force-pushed the expiry-per-dataset branch 3 times, most recently from 5e5928a to 45ebcfe Compare January 17, 2024 18:03

tbekas force-pushed the safe-block-deletion branch 2 times, most recently from 4ac3464 to 31f431b Compare January 18, 2024 15:28

tbekas force-pushed the safe-block-deletion branch 4 times, most recently from fd45659 to d7053f7 Compare January 30, 2024 12:23

tbekas force-pushed the expiry-per-dataset branch 2 times, most recently from cbc430e to d7b246a Compare January 31, 2024 16:53

tbekas force-pushed the safe-block-deletion branch from d7053f7 to ac896f9 Compare January 31, 2024 16:54

tbekas force-pushed the expiry-per-dataset branch from d7b246a to f10aa90 Compare January 31, 2024 16:55

tbekas changed the title ~~WIP Expiry per dataset~~ Expiry per dataset Jan 31, 2024

tbekas marked this pull request as ready for review January 31, 2024 16:55

tbekas requested review from dryajov, benbierens and elcritch January 31, 2024 16:56

tbekas force-pushed the expiry-per-dataset branch from f10aa90 to 981332f Compare January 31, 2024 17:41

tbekas force-pushed the safe-block-deletion branch from ac896f9 to 532ceda Compare February 7, 2024 15:37

tbekas force-pushed the expiry-per-dataset branch 2 times, most recently from 44e1718 to d2666ac Compare February 7, 2024 15:51

dryajov reviewed Feb 7, 2024

View reviewed changes

dryajov reviewed Feb 10, 2024

View reviewed changes

dryajov approved these changes Feb 10, 2024

View reviewed changes

tbekas force-pushed the safe-block-deletion branch 3 times, most recently from fa36cd4 to a951bbf Compare February 12, 2024 23:17

tbekas force-pushed the expiry-per-dataset branch from fc66b1d to abb6a12 Compare June 5, 2024 17:09

tbekas changed the base branch from safe-block-deletion to master June 5, 2024 17:41

tbekas force-pushed the expiry-per-dataset branch from abb6a12 to 4cf5cd0 Compare June 18, 2024 13:47

benbierens reviewed Jun 19, 2024

View reviewed changes

tbekas force-pushed the expiry-per-dataset branch 6 times, most recently from 8b6d782 to 61617b8 Compare June 19, 2024 15:06

tbekas mentioned this pull request Jun 20, 2024

Safe block deletion (with ref count) #631

Merged

tbekas force-pushed the expiry-per-dataset branch 2 times, most recently from deec57d to 4d39a5c Compare June 21, 2024 07:37

emizzle reviewed Jun 24, 2024

View reviewed changes

tbekas force-pushed the expiry-per-dataset branch 2 times, most recently from 23e8122 to 0ab4b1c Compare June 24, 2024 10:18

dryajov requested changes Jul 29, 2024

View reviewed changes

emizzle reviewed Jul 30, 2024

View reviewed changes

gmega reviewed Jul 30, 2024

View reviewed changes

benbierens approved these changes Aug 6, 2024

View reviewed changes

tbekas force-pushed the expiry-per-dataset branch 2 times, most recently from 8e1699b to 950c729 Compare August 7, 2024 14:29

Expiry per dataset

e9972d1

tbekas force-pushed the expiry-per-dataset branch from 950c729 to e9972d1 Compare August 14, 2024 15:46

gmega added the Client See https://miro.com/app/board/uXjVNZ03E-c=/ for details label Sep 18, 2024

gmega assigned benbierens and unassigned tbekas Sep 18, 2024

		@@ -233,6 +224,13 @@ proc retrieve*(

		# Retrieve all blocks of the dataset sequentially from the local store or network
		trace "Creating store stream for manifest", cid


		CodexProof.decode(bytes)

		func `%`*(proof: CodexProof): JsonNode = % byteutils.toHex(proof.encode())

		except Exception as exc:
		error "Unexpected error during maintenance", msg = exc.msg

		if err =? (await self.recordCheckpoint(treeCid, datasetMd)).errorOption:
		return failure(err)

Expiry per dataset #678

Are you sure you want to change the base?

Expiry per dataset #678

Conversation

tbekas commented Jan 17, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dryajov left a comment

Choose a reason for hiding this comment

benbierens left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tbekas Jun 24, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tbekas Jul 31, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tbekas Jun 24, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tbekas Jul 31, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tbekas Jun 24, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tbekas commented Jun 24, 2024

dryajov left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dryajov commented Jul 29, 2024

emizzle left a comment • edited Loading

Choose a reason for hiding this comment

gmega left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tbekas Jul 31, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

benbierens left a comment

Choose a reason for hiding this comment

tbekas commented Jan 17, 2024 •

edited

Loading

tbekas Jun 24, 2024 •

edited

Loading

tbekas Jul 31, 2024 •

edited

Loading

tbekas Jun 24, 2024 •

edited

Loading

tbekas Jul 31, 2024 •

edited

Loading

tbekas Jun 24, 2024 •

edited

Loading

dryajov left a comment •

edited

Loading

emizzle left a comment •

edited

Loading

tbekas Jul 31, 2024 •

edited

Loading