Skip to content

Add SHA256 checksum support for uploads#1756

Open
srijanshukla18 wants to merge 2 commits intoawslabs:mainfrom
srijanshukla18:feat/add-sha256-checksum-support
Open

Add SHA256 checksum support for uploads#1756
srijanshukla18 wants to merge 2 commits intoawslabs:mainfrom
srijanshukla18:feat/add-sha256-checksum-support

Conversation

@srijanshukla18
Copy link

Description

This change extends the --upload-checksums CLI option to support SHA256 in addition to the existing CRC32C algorithm.

Usage

Users can now specify:

  • --upload-checksums=crc32c (existing default)
  • --upload-checksums=sha256 (new)
  • --upload-checksums=off

Changes

Core changes:

  • Extended UploadChecksums enum in CLI to include Sha256 variant
  • Changed S3FilesystemConfig to store ChecksumAlgorithm instead of boolean
  • Added ChecksumConfig::trailing_sha256() and upload_review_sha256() methods
  • Modified PutObjectTrailingChecksums enum to carry algorithm information

Implementation:

  • Updated put_object.rs to dispatch to appropriate checksum config based on algorithm
  • Updated atomic.rs to handle both CRC32C and SHA256 for multi-part uploads
  • Fixed mock_client to compute correct checksums for SHA256

Backward Compatibility

The implementation maintains backward compatibility by defaulting to CRC32C when checksums are enabled without specifying an algorithm.

Testing

  • All existing tests updated to work with new enum structure
  • Mock client updated to properly compute SHA256 checksums for parts

Related to extending checksum algorithm support in mountpoint-s3.

This change extends the --upload-checksums CLI option to support SHA256
in addition to the existing CRC32C algorithm. Users can now specify:
- --upload-checksums=crc32c (existing default)
- --upload-checksums=sha256 (new)
- --upload-checksums=off

Changes made:
- Extended UploadChecksums enum in CLI to include Sha256 variant
- Changed S3FilesystemConfig to store ChecksumAlgorithm instead of boolean
- Added ChecksumConfig::trailing_sha256() and upload_review_sha256() methods
- Modified PutObjectTrailingChecksums enum to carry algorithm information
- Updated put_object.rs to dispatch to appropriate checksum config based on algorithm
- Updated atomic.rs to handle both CRC32C and SHA256 for multi-part uploads
- Updated all existing tests to work with new enum structure

The implementation maintains backward compatibility by defaulting to CRC32C
when checksums are enabled without specifying an algorithm.

Signed-off-by: Srijan Shukla <[email protected]>
The mock client was always computing CRC32C checksums regardless of the
requested algorithm. This caused SHA256 uploads to have mismatched checksums
where the parts had CRC32C checksums but the upload review advertised SHA256.

Changes:
- Updated parts() to compute checksums based on the requested algorithm
- Added compute_sha256_of_sha256_checksums() helper function
- Updated complete_inner() to set the correct checksum field (checksum_crc32c
  or checksum_sha256) based on the algorithm

This ensures that when SHA256 is requested, SHA256 checksums are computed
for each part and the whole object, matching the behavior expected by S3.

Signed-off-by: Srijan Shukla <[email protected]>
Copy link
Contributor

@passaro passaro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for submitting a PR and apologies for the late review.

Unfortunately, SHA256 cannot work with the review strategy implemented in Mountpoint for multi-part uploads. As you can see in atomic.rs, Mountpoint keeps a running hash (Crc32c) of the data written to the S3 client and then compares it with the combined checksums of the uploaded parts to verify the integrity of the data before completing the upload. Combining the checksums is only possible with CRC algorithms. but not with SHA.

Without rethinking this validation step, your change will result in every upload failing.

bucket: String,
key: String,
next_request_offset: u64,
hasher: crc32c::Hasher,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is still using CRC32C, regardless of the algorithm set in trailing_checksums. When trying to use ChecksumAlgorithm::Sha256, verify_checksums will fail.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants