Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add punch hole GC #326

Open
wants to merge 9 commits into
base: master
Choose a base branch
from
Open

Add punch hole GC #326

wants to merge 9 commits into from

Conversation

v01dstar
Copy link
Contributor

@v01dstar v01dstar commented Aug 10, 2024

This PR together with #323 implements Titan's new GC solution: Punch hole.

Titan has implemented 2 types of GC methods:

  • GC based on stats (when discardable ratio of blob files reach certain threshold)
  • GC during RocksDB's compactions (a.k.a level merge)

The first approach introduces write amplification, since it needs to update references inside RocksDB to reflect blobs' new location, and due to this reason, the threshold has to be relatively high to minimize this impact.
The second approach has no extra write amplification, however, it may result in space waste. Imagine large portions of some blob files have been moved, but not yet meet the threshold to be entirely replaced.

Punch hole is a file system API, accepts a file descriptor, a start position and a length, removes the data from the file at the desired location. To application, this part of the file is set to 0. To file system, if some the underlying storage blocks are fully covered by the removed part, file system can reclaim those space.

This project try to utilize the punch hole API, in-place delete blobs during GC, to avoid updating references inside RocksDB

src/blob_format.h Outdated Show resolved Hide resolved
include/titan/options.h Outdated Show resolved Hide resolved
// The effective size of current file. This is different from `file_size_`, as
// `file_size_` is the original size of the file, and does not consider space
// reclaimed by punch hole GC.
// We can't use file system's `st_blocks` to get the logical size, because
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's okay to get the size as effective_file_size after restart. The size doesn't have to be so precise. Indeed, it may have false positive for triggering punch hole GC, but it would be updated to the accurate number after the gc scan.
Then, we can get rid of updating manifest for effective_file_size

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

src/blob_file_builder.cc Outdated Show resolved Hide resolved
src/blob_file_size_collector.cc Outdated Show resolved Hide resolved
src/punch_hole_gc_job.cc Outdated Show resolved Hide resolved
src/blob_gc_picker.cc Show resolved Hide resolved
src/blob_gc_picker.cc Outdated Show resolved Hide resolved
src/blob_storage.cc Show resolved Hide resolved
Copy link
Member

@hbisheng hbisheng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be great to have a PR description to help future readers. The description could provide some background and highlight key aspects of implementation such as the introduction of the PunchHoleGCJob class and how the job is scheduled (when its snapshot becomes the oldest).

src/db_impl.cc Outdated Show resolved Hide resolved
@v01dstar v01dstar force-pushed the punch-hole branch 2 times, most recently from 891fb24 to 1f0ecf1 Compare August 28, 2024 22:04
// record by adjusting iterate_offset_, otherwise (not a hole-punch record),
// we should break the loop and return the record, iterate_offset_ is
// already adjusted inside GetBlobRecord() in this case.
if (live || !status().ok()) return;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just trying to learn, in what cases will status() be not ok? Do we need to set valid_ to false in that case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For example, IO errors, In this case, we should not continue the loop.

As for whether setting valid_ to false, I don't think it is necessary, since valid() (the function not the variable) considers status_ already. So I didn't change the behavior (current code does not set valid_ either when encountering io errors).

@@ -818,6 +818,24 @@ void TitanDBImpl::ReleaseSnapshot(const Snapshot* snapshot) {
// We can record here whether the oldest snapshot is released.
// If not, we can just skip the next round of purging obsolete files.
db_->ReleaseSnapshot(snapshot);
{
MutexLock l(&mutex_);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not wise to acuqire a mutex here, since this is a hot path, almost all read requrests go through this (Titan creates a ManagedSnapshot implicitly). I will refactor this with atomic operations, but in a separate PR

Signed-off-by: v01dstar <[email protected]>
Signed-off-by: v01dstar <[email protected]>
Signed-off-by: v01dstar <[email protected]>
Signed-off-by: v01dstar <[email protected]>
Signed-off-by: v01dstar <[email protected]>
Signed-off-by: v01dstar <[email protected]>
Signed-off-by: v01dstar <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants