-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add compression for the index format #17
Labels
enhancement
New feature or request
Comments
Format Outline:
|
Kinda fixed with gztool index format support. There still might be yet another index format that works cross-compression and fixes some small issues with gztool indexes. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
For files with very large compression ratios, the index might actually grow larger than the original file because the seek points are 32 KiB of uncompressed data for each chunk (every 4 MiB per default). I.e., for compression ratios >128, the seek points take up as much as the original file. However, in these cases, the seek points would also compress very well even without back-references to the previous data. Therefore, compression should be done. There could be:
Other unrelated compression:
All of these optimizations would make the format incompatible with that of
indexed_gzip
. This is problematic, especially in combination with ratarmount, which currently stores indexes in theindexed_gzip
format. However, the shaving off of bytes could be implemented by simply replacing them with zeros in order to make them compress more easily! Then, ratarmount could implement a compression for the whole index file or even in chunks and provide a Python file object abstraction to rapidgzip when #10 has been implemented. This would preserve the compatibility of ratarmount indexes agnostic of the backend (indexed_gzip, rapidgzip) when implemented correctly.Still, for standalone use, the compression per seek point would still be useful.
The text was updated successfully, but these errors were encountered: