-
Notifications
You must be signed in to change notification settings - Fork 165
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add punch hole GC #326
base: master
Are you sure you want to change the base?
Add punch hole GC #326
Conversation
Signed-off-by: v01dstar <[email protected]>
src/blob_format.h
Outdated
// The effective size of current file. This is different from `file_size_`, as | ||
// `file_size_` is the original size of the file, and does not consider space | ||
// reclaimed by punch hole GC. | ||
// We can't use file system's `st_blocks` to get the logical size, because |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's okay to get the size as effective_file_size
after restart. The size doesn't have to be so precise. Indeed, it may have false positive for triggering punch hole GC, but it would be updated to the accurate number after the gc scan.
Then, we can get rid of updating manifest for effective_file_size
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be great to have a PR description to help future readers. The description could provide some background and highlight key aspects of implementation such as the introduction of the PunchHoleGCJob
class and how the job is scheduled (when its snapshot becomes the oldest).
Signed-off-by: v01dstar <[email protected]>
Signed-off-by: v01dstar <[email protected]>
// record by adjusting iterate_offset_, otherwise (not a hole-punch record), | ||
// we should break the loop and return the record, iterate_offset_ is | ||
// already adjusted inside GetBlobRecord() in this case. | ||
if (live || !status().ok()) return; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just trying to learn, in what cases will status()
be not ok? Do we need to set valid_
to false in that case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For example, IO errors, In this case, we should not continue the loop.
As for whether setting valid_
to false, I don't think it is necessary, since valid()
(the function not the variable) considers status_
already. So I didn't change the behavior (current code does not set valid_
either when encountering io errors).
Signed-off-by: v01dstar <[email protected]>
Signed-off-by: v01dstar <[email protected]>
Signed-off-by: v01dstar <[email protected]>
@@ -818,6 +818,24 @@ void TitanDBImpl::ReleaseSnapshot(const Snapshot* snapshot) { | |||
// We can record here whether the oldest snapshot is released. | |||
// If not, we can just skip the next round of purging obsolete files. | |||
db_->ReleaseSnapshot(snapshot); | |||
{ | |||
MutexLock l(&mutex_); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not wise to acuqire a mutex here, since this is a hot path, almost all read requrests go through this (Titan creates a ManagedSnapshot
implicitly). I will refactor this with atomic operations, but in a separate PR
Signed-off-by: v01dstar <[email protected]>
This PR together with #323 implements Titan's new GC solution: Punch hole.