Skip to content

Make chunk size configurable #13

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Apr 1, 2025
Merged

Make chunk size configurable #13

merged 7 commits into from
Apr 1, 2025

Conversation

Horofic
Copy link
Member

@Horofic Horofic commented Mar 24, 2025

This PR makes the chunk_size variable configurable in preparation for contributing POC code mentioned in fox-it/dissect.ntfs#41. Decompression code largely overlapped with the code here.

We might want to consider moving this streaming class to dissect.ntfs (or somewhere else better suited) in the future.

@Horofic Horofic requested review from Copilot and Schamper March 24, 2025 19:24
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR makes the chunk size configurable by introducing a new parameter and replacing the constant DEFAULT_CHUNK_SIZE with an instance variable in the WIM archive streaming logic.

  • Updates the initializer to accept a configurable chunk size.
  • Replaces all usages of DEFAULT_CHUNK_SIZE in chunk calculation with the new instance variable.
  • Adjusts calculations in the _read method accordingly.
Comments suppressed due to low confidence (1)

dissect/archive/wim.py:444

  • The attribute 'chuck_size' appears to be a typo. Consider renaming it to 'chunk_size' for clarity and consistency with the parameter name.
self.chuck_size = chunk_size

Copy link

codecov bot commented Mar 24, 2025

Codecov Report

Attention: Patch coverage is 83.33333% with 1 line in your changes missing coverage. Please review.

Project coverage is 68.92%. Comparing base (6992107) to head (747d9df).
Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
dissect/archive/wim.py 83.33% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main      #13      +/-   ##
==========================================
+ Coverage   68.89%   68.92%   +0.02%     
==========================================
  Files           9        9              
  Lines        1196     1197       +1     
==========================================
+ Hits          824      825       +1     
  Misses        372      372              
Flag Coverage Δ
unittests 68.92% <83.33%> (+0.02%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@Schamper
Copy link
Member

Can move it to dissect.util.

@Horofic Horofic requested a review from Schamper March 25, 2025 08:47
@Horofic
Copy link
Member Author

Horofic commented Mar 25, 2025

Will move and create a PR after this one.

Copy link
Member

@Schamper Schamper left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a unit test with an alternative cluster size?

@Horofic Horofic requested review from Schamper and Copilot March 26, 2025 15:24
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR makes the chunk size configurable in the WIM decompression logic and updates tests to support different chunk size configurations in preparation for upcoming POC contributions.

  • Modified tests to parameterize different chunk sizes for verifying the header compression size.
  • Added new fixtures in conftest.py to supply WIM files with various chunk sizes.
  • Updated the WIM class to accept a configurable chunk_size and use it in chunk calculations.

Reviewed Changes

Copilot reviewed 3 out of 6 changed files in this pull request and generated 1 comment.

File Description
tests/test_wim.py Parameterized test now asserts the header’s CompressionSize against different chunk sizes.
tests/conftest.py Added fixtures for test files representing different chunk sizes.
dissect/archive/wim.py Updated WIM class to compute chunk-related values using the configurable chunk_size.
Files not reviewed (3)
  • tests/data/test16k.wim.gz: Language not supported
  • tests/data/test4k.wim.gz: Language not supported
  • tests/data/test8k.wim.gz: Language not supported
Comments suppressed due to low confidence (1)

dissect/archive/wim.py:444

  • The header’s CompressionSize field is expected to match the configured chunk_size (as asserted in tests), but there is no visible update to the header. Ensure that the header is updated to reflect self.chunk_size if that is the intended behavior.
self.chunk_size = chunk_size

@Horofic Horofic requested a review from Schamper March 27, 2025 14:42
@Horofic Horofic merged commit 91e1ee7 into main Apr 1, 2025
22 checks passed
@Horofic Horofic deleted the wof branch April 1, 2025 15:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants