-
-
Notifications
You must be signed in to change notification settings - Fork 758
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement chunk comparison and selective extraction for borg extract (#5638) #8632
base: master
Are you sure you want to change the base?
Conversation
@ThomasWaldmann Since this is a significant change, I wanted to open this PR as a draft to get your feedback before proceeding further. Could you please review the current implementation and provide any suggestions or improvements? |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #8632 +/- ##
==========================================
- Coverage 81.83% 81.80% -0.04%
==========================================
Files 74 74
Lines 13319 13393 +74
Branches 1963 1981 +18
==========================================
+ Hits 10900 10956 +56
- Misses 1755 1767 +12
- Partials 664 670 +6 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add some tests, so that codecov does not spam the whole diff view?
Done. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some first feedback
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some feedback
- Add compare_and_extract_chunks functionality - Add comprehensive test coverage - Fix file state tracking with st parameter
src/borg/archive.py
Outdated
os.unlink(path) | ||
st = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
did you think about this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ThomasWaldmann I didn't realize that this would break the continue_extraction
functionality. The issue is compare_and_extract_chunks
still tries to use stale st info after the file is unlinked.
I have tried:
- Tracking unlink state with flags
- Checking inode/link count
- Modifying comparison logic
The only fix in my mind is the extra OS call to check the file's existence. Maybe I'm missing something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the fs file is a normal file, your code requires it to be there, so it can be "updated" - thus it must not be removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And we also need to think about what to do with existing metadata, like acls, xattrs, fs flags, ... - the current code assumes that there is no existing metadata and just adds the stuff from the archive item.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ThomasWaldmann another solution I can think of is:
try:
st = os.stat(path, follow_symlinks=False)
if continue_extraction and same_item(item, st):
return # done! we already have fully extracted this file in a previous run.
if stat.S_ISREG(st.st_mode) and not continue_extraction:
if self.compare_and_extract_chunks(item, path, st=st, pi=pi):
return
elif stat.S_ISDIR(st.st_mode):
os.rmdir(path)
else:
os.unlink(path)
This way we can try an in-place update attempt before any removal/recreation. If the function returns True, we're done. Otherwise, we fall back to the original remove/recreate behavior.
Also, since I'm using restore_attrs()
just like the existing code and not handling metadata directly at all, shouldn't it be consistent with how Borg already works?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I already said: the restore_attrs code expects a fresh state of the file (like newly created, no acls, no xattrs) and just adds the stuff from the archived item.
But if you are updating an existing file, there can be already acls or xattrs that do not match what's in the archive (and what shall be the final state).
@ThomasWaldmann This is getting a bit too advanced for my understanding, but I've still tried to implement and verify the attribute restoration. I'm still stuck on the |
Archive File Chunk Comparison and Extraction
This implementation provides efficient file restoration from archives by comparing and extracting chunks. Instead of blindly extracting entire files, it: