Add SHA256 checksum support for uploads#1756
Add SHA256 checksum support for uploads#1756srijanshukla18 wants to merge 2 commits intoawslabs:mainfrom
Conversation
This change extends the --upload-checksums CLI option to support SHA256 in addition to the existing CRC32C algorithm. Users can now specify: - --upload-checksums=crc32c (existing default) - --upload-checksums=sha256 (new) - --upload-checksums=off Changes made: - Extended UploadChecksums enum in CLI to include Sha256 variant - Changed S3FilesystemConfig to store ChecksumAlgorithm instead of boolean - Added ChecksumConfig::trailing_sha256() and upload_review_sha256() methods - Modified PutObjectTrailingChecksums enum to carry algorithm information - Updated put_object.rs to dispatch to appropriate checksum config based on algorithm - Updated atomic.rs to handle both CRC32C and SHA256 for multi-part uploads - Updated all existing tests to work with new enum structure The implementation maintains backward compatibility by defaulting to CRC32C when checksums are enabled without specifying an algorithm. Signed-off-by: Srijan Shukla <[email protected]>
The mock client was always computing CRC32C checksums regardless of the requested algorithm. This caused SHA256 uploads to have mismatched checksums where the parts had CRC32C checksums but the upload review advertised SHA256. Changes: - Updated parts() to compute checksums based on the requested algorithm - Added compute_sha256_of_sha256_checksums() helper function - Updated complete_inner() to set the correct checksum field (checksum_crc32c or checksum_sha256) based on the algorithm This ensures that when SHA256 is requested, SHA256 checksums are computed for each part and the whole object, matching the behavior expected by S3. Signed-off-by: Srijan Shukla <[email protected]>
passaro
left a comment
There was a problem hiding this comment.
Thanks for submitting a PR and apologies for the late review.
Unfortunately, SHA256 cannot work with the review strategy implemented in Mountpoint for multi-part uploads. As you can see in atomic.rs, Mountpoint keeps a running hash (Crc32c) of the data written to the S3 client and then compares it with the combined checksums of the uploaded parts to verify the integrity of the data before completing the upload. Combining the checksums is only possible with CRC algorithms. but not with SHA.
Without rethinking this validation step, your change will result in every upload failing.
| bucket: String, | ||
| key: String, | ||
| next_request_offset: u64, | ||
| hasher: crc32c::Hasher, |
There was a problem hiding this comment.
This is still using CRC32C, regardless of the algorithm set in trailing_checksums. When trying to use ChecksumAlgorithm::Sha256, verify_checksums will fail.
Description
This change extends the
--upload-checksumsCLI option to support SHA256 in addition to the existing CRC32C algorithm.Usage
Users can now specify:
--upload-checksums=crc32c(existing default)--upload-checksums=sha256(new)--upload-checksums=offChanges
Core changes:
UploadChecksumsenum in CLI to includeSha256variantS3FilesystemConfigto storeChecksumAlgorithminstead of booleanChecksumConfig::trailing_sha256()andupload_review_sha256()methodsPutObjectTrailingChecksumsenum to carry algorithm informationImplementation:
put_object.rsto dispatch to appropriate checksum config based on algorithmatomic.rsto handle both CRC32C and SHA256 for multi-part uploadsBackward Compatibility
The implementation maintains backward compatibility by defaulting to CRC32C when checksums are enabled without specifying an algorithm.
Testing
Related to extending checksum algorithm support in mountpoint-s3.