-
Notifications
You must be signed in to change notification settings - Fork 170
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add punch hole GC #326
base: master
Are you sure you want to change the base?
Add punch hole GC #326
Conversation
src/blob_format.h
Outdated
// The effective size of current file. This is different from `file_size_`, as | ||
// `file_size_` is the original size of the file, and does not consider space | ||
// reclaimed by punch hole GC. | ||
// We can't use file system's `st_blocks` to get the logical size, because |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's okay to get the size as effective_file_size
after restart. The size doesn't have to be so precise. Indeed, it may have false positive for triggering punch hole GC, but it would be updated to the accurate number after the gc scan.
Then, we can get rid of updating manifest for effective_file_size
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be great to have a PR description to help future readers. The description could provide some background and highlight key aspects of implementation such as the introduction of the PunchHoleGCJob
class and how the job is scheduled (when its snapshot becomes the oldest).
891fb24
to
1f0ecf1
Compare
// record by adjusting iterate_offset_, otherwise (not a hole-punch record), | ||
// we should break the loop and return the record, iterate_offset_ is | ||
// already adjusted inside GetBlobRecord() in this case. | ||
if (live || !status().ok()) return; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just trying to learn, in what cases will status()
be not ok? Do we need to set valid_
to false in that case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For example, IO errors, In this case, we should not continue the loop.
As for whether setting valid_
to false, I don't think it is necessary, since valid()
(the function not the variable) considers status_
already. So I didn't change the behavior (current code does not set valid_
either when encountering io errors).
@@ -818,6 +818,24 @@ void TitanDBImpl::ReleaseSnapshot(const Snapshot* snapshot) { | |||
// We can record here whether the oldest snapshot is released. | |||
// If not, we can just skip the next round of purging obsolete files. | |||
db_->ReleaseSnapshot(snapshot); | |||
{ | |||
MutexLock l(&mutex_); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not wise to acuqire a mutex here, since this is a hot path, almost all read requrests go through this (Titan creates a ManagedSnapshot
implicitly). I will refactor this with atomic operations, but in a separate PR
Signed-off-by: v01dstar <[email protected]>
Signed-off-by: v01dstar <[email protected]>
Signed-off-by: v01dstar <[email protected]>
Signed-off-by: v01dstar <[email protected]>
Signed-off-by: v01dstar <[email protected]>
Signed-off-by: v01dstar <[email protected]>
Signed-off-by: v01dstar <[email protected]>
Signed-off-by: v01dstar <[email protected]>
Signed-off-by: v01dstar <[email protected]>
This PR together with #323 implements Titan's new GC solution: Punch hole.
Titan has implemented 2 types of GC methods:
The first approach introduces write amplification, since it needs to update references inside RocksDB to reflect blobs' new location, and due to this reason, the threshold has to be relatively high to minimize this impact.
The second approach has no extra write amplification, however, it may result in space waste. Imagine large portions of some blob files have been moved, but not yet meet the threshold to be entirely replaced.
Punch hole is a file system API, accepts a file descriptor, a start position and a length, removes the data from the file at the desired location. To application, this part of the file is set to 0. To file system, if some the underlying storage blocks are fully covered by the removed part, file system can reclaim those space.
This project try to utilize the punch hole API, in-place delete blobs during GC, to avoid updating references inside RocksDB