Skip to content

Conversation

ThomasWaldmann
Copy link
Member

No description provided.

@ThomasWaldmann
Copy link
Member Author

ThomasWaldmann commented Sep 6, 2025

Fix and test mostly written by Junie AI, some cleanups by me.

Copy link

codecov bot commented Sep 6, 2025

Codecov Report

❌ Patch coverage is 75.00000% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 81.16%. Comparing base (9241888) to head (d3064e0).
⚠️ Report is 3 commits behind head on 1.4-maint.

Files with missing lines Patch % Lines
src/borg/archive.py 75.00% 1 Missing ⚠️
Additional details and impacted files
@@              Coverage Diff              @@
##           1.4-maint    #9003      +/-   ##
=============================================
- Coverage      81.19%   81.16%   -0.04%     
=============================================
  Files             38       38              
  Lines          11222    11218       -4     
  Branches        1761     1761              
=============================================
- Hits            9112     9105       -7     
- Misses          1549     1550       +1     
- Partials         561      563       +2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@MichaelDietzel
Copy link

MichaelDietzel commented Oct 2, 2025

I finally did some simple tests with these changes and the results are more consistent than what I got before. I will run some bigger tests over night.
This looks a lot simpler than what I attempted which I like. But could there maybe be some missed cases?
A few things that caused difficulties for me:

  • In Archive.py in the save()-function there is line 607 self.items_buffer.flush(flush=True). If I understand it correctly this also causes some stats to be updated. But it looks to me like these stats are never used for anything?
  • In Archive.py in the set_meta()-function there is line 962 self.cache.add_chunk(new_id, data, self.stats) which also appears to me as it counts metadata into some statistics. However this is probably not used during archive creation
  • How does the size of symlinks influence the stats? The size of symlinks could be platform dependent, so I am completely unsure how this is handled. Maybe this is out of scope for this change?

What I also do not understand: what are these new metadata stats used for, are they stored or reported at any time? If not: maybe they do not even need to be generated at all.

@MichaelDietzel
Copy link

MichaelDietzel commented Oct 5, 2025

I did a few more tests for archive creation under linux with this PR added:

Original size exactly matches the size of the original files (If only regular files are used, symlinks are not counted. I have not tested other special cases.)

The compressed size always looks plausible

The deduplicated size still has some unexpected results:
Here is an example where I created an empty archive:

create
                       Original size      Compressed size    Deduplicated size
This archive:                    0 B                  0 B                  0 B
All archives:                    0 B                  0 B                700 B

info
                       Original size      Compressed size    Deduplicated size
This archive:                    0 B                  0 B                700 B
All archives:                    0 B                  0 B                700 B

Do I understand the code correctly that the reported sizes in borg info are "cached" values that are stored to the repo instead of recalculated values, so that they should not change between create and info?

The empty repo without any archives reported 0 for all sizes.

And here an example for a bigger archive:

create
                       Original size      Compressed size    Deduplicated size
This archive:         967532246644 B       889015534170 B       727966606714 B
All archives:         967532246644 B       889015534170 B       728235281358 B

info
                       Original size      Compressed size    Deduplicated size
This archive:         967532246644 B       889015534170 B       728235281358 B
All archives:         967532246644 B       889015534170 B       728235281358 B

Sadly I have no Idea what could be causing this except for what I already stated I was wondering about in my previous post.
I found two places where these additional Bytes are added to the stats in the save-function:
self.items_buffer.flush(flush=True) and self.cache.add_chunk(self.id, data, self.meta_stats if hasattr(self, 'meta_stats') else self.stats). However I would still have to find out why they are not displayed for "This archive" during creation but then are displayed afterwards or why they are only visible for the deduplicated size.

I did not test any other operations than archive creation, that I would have to do next.

@ThomasWaldmann
Copy link
Member Author

ThomasWaldmann commented Oct 10, 2025

@MichaelDietzel I let Junie do a bit more work. First it added a bloody workaround, but on second try I guess it found the correct way. :-)

borg now in general does not account for metadata chunks in "this archive" stats, neither in "create" nor in "info".

some stats differences are expected.
@ThomasWaldmann
Copy link
Member Author

@MichaelDietzel guess this is as good as it gets?

there is still some discrepancy between "this archive" and "all archives" (which is rather "whole repo", computed in a rather different way).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants