Skip to content

S3: Add LegacyMd5Plugin to S3 client builder #12264

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Jul 4, 2025

Conversation

ebyhr
Copy link
Contributor

@ebyhr ebyhr commented Feb 14, 2025

The recent AWS SDK bump introduced strong integrity checksums, and broke compatibility with many S3-compatible object storages (pre-2025 Minio, Vast, Dell EC etc).

In Trino project, we received the error report (Missing required header for this request: Content-Md5) from several users and had to disable the check temporarily. We recommend disabling it in Iceberg as well. I faced this issue when I tried upgrading Iceberg library to 1.8.0 in Trino.

Relates to trinodb/trino#24954 & trinodb/trino#24713

@github-actions github-actions bot added the AWS label Feb 14, 2025
@ebyhr ebyhr force-pushed the ebi/s3-integrity-check branch from e629e83 to 83185e7 Compare February 14, 2025 05:02
// TODO Remove me once all of the S3-compatible storage support strong integrity checks
@SuppressWarnings("deprecation")
static AwsRequestOverrideConfiguration disableStrongIntegrityChecksums() {
return AwsRequestOverrideConfiguration.builder().signer(AwsS3V4Signer.create()).build();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I see, is the issue in compatibility is that older versions of Minio/3rd party object storage solutions expect Content-MD5 and in the new SDK we are not sending that and so the service rejects the request? Still feels like there should be a different way to force setting MD5

Copy link
Contributor Author

@ebyhr ebyhr Feb 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Setting WHEN_REQUIRED for checksum calculation/validation doesn't resolve as far as I tested. Calculating MD5 for PutObjectRequest looks feasible, but I'm not sure how to do it for DeleteObjectsRequest.

@wendigo Do you happen to know any other approaches?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have the same issue on the PyIceberg: apache/iceberg-python#1546 How about making this configurable?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ebyhr no. The only workaround that I have found is to force previous signature and it works against Dell ECS, old Minio versions and Ozone. I haven't tested Vast since I don't have access to it

Copy link

@mmgaggle mmgaggle Feb 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We added support in Ceph. There is a lot of places this can pop up in PutObject, UploadPart, CompleteMultipartUpload, AWSv2 vs AWSv4 semantics, etc.

We noticed the breakage when watsonx.data rev'd their version of the java sdk

@@ -149,4 +151,10 @@ static void configurePermission(
Function<ObjectCannedACL, S3Request.Builder> aclSetter) {
aclSetter.apply(s3FileIOProperties.acl());
}

// TODO Remove me once all of the S3-compatible storage support strong integrity checks
Copy link
Contributor

@amogh-jahagirdar amogh-jahagirdar Feb 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand the intent but this doesn't feel like a practical way of going about this just because there's many out there and "all" just seems like a moving goalpost.

@@ -29,15 +29,18 @@
import software.amazon.awssdk.services.s3.S3ClientBuilder;

public class MinioUtil {
public static final String LATEST_TAG = "latest";
// This version doesn't support strong integrity checks
static final String LEGACY_TAG = "RELEASE.2024-12-18T13-15-44Z";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we try and roll back all of the changes in the PR and then only update the TAG to RELEASE.2025-01-20T14-49-07Z (see also minio/minio#20845 (comment)). I've tried it locally and that fixed the issue described in #12237 when running those tests locally with Docker Desktop

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nastra it does, but it won't still work with some other S3-compatible storages like Vast, Dell ECS, so upgrading Minio to compatible version doesn't solve the issue

Copy link
Contributor Author

@ebyhr ebyhr Feb 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nastra We intentionally specified pre-2025 here. Using a newer tag hides the actual problem.

@nastra
Copy link
Contributor

nastra commented Feb 19, 2025

I think we should revert the AWS SDK version (#12339) for 1.8.1 and then properly fix it for 1.9.0. @ebyhr does that make sense?

@wendigo
Copy link

wendigo commented Feb 19, 2025

@nastra that will work :)

@RussellSpitzer
Copy link
Member

@ebyhr How hard would it be for us to get some integration tests with one of these systems into the Iceberg project? Seems like we should have some coverage for these other S3-Compat systems. I'd also be ok with a separate project that we just use as a canary before release.

@nk1506
Copy link
Contributor

nk1506 commented Feb 21, 2025

How hard would it be for us to get some integration tests with one of these systems into the Iceberg project? Seems like we should have some coverage for these other S3-Compat systems. I'd also be ok with a separate project that we just use as a canary before release.

Hi @RussellSpitzer,
@jbonofre had a similar vision, which led to the creation of Project Iceland with the goal of providing integration test coverage for various catalogs. Initially, the plan was to cover all catalog types, but given the recent focus on REST and Polaris, we are currently prioritizing REST-based catalogs.

@mmgaggle
Copy link

How hard would it be for us to get some integration tests with one of these systems into the Iceberg project? Seems like we should have some coverage for these other S3-Compat systems. I'd also be ok with a separate project that we just use as a canary before release.

Hi @RussellSpitzer, @jbonofre had a similar vision, which led to the creation of Project Iceland with the goal of providing integration test coverage for various catalogs. Initially, the plan was to cover all catalog types, but given the recent focus on REST and Polaris, we are currently prioritizing REST-based catalogs.

This exists, but it's mostly python / boto based.

https://github.com/ceph/s3-tests

Various vendors create test runners to validate S3 compat including Snowflake, Splunk (uses an older branch of s3-tests), Terradata, etc.

Hadoop common has one as well:

https://github.com/apache/hadoop/tree/trunk/hadoop-tools/hadoop-aws/src/test

If there were an Iceberg one, it would be something we'd validate our Ceph releases against.

@steveloughran
Copy link
Contributor

@mmgaggle I'm actually setting up the s3a tests to actually test through iceberg and parquet, so we can validate features and performance optimisations through our code.

Initially, apache/hadoop#7316 has gone in for the bulk delete API of #10233; (please can someone review/merge this!)... it will then act as a regression test of the s3a connector, as well as being easy test local iceberg/parquet builds against arbitrary stores through our test harness. That test harness uses the hadoop IOStatistics API to make assertions about the actual number of remote S3 calls made -this lets you identify regressions in the actually amount of network IO which takes place. Everyone cares about this.

Even with this, you should have a test harness which

  • can be targeted at production S3 stores
  • contains a good set of operations, both low level FileIO and higher level API calls
  • has many of those tests abstracted up to work with all FileIO implementation.
  • provides really good diagnostics on test failures.

If someone starts that, I'd be happy to help. What i'm not going to is say "here are the tests you need". I did try to do that with spark and the spark-hadoop-cloud module, but there was no interest in full integration tests. I'd only do it for iceberg as part of a collaborative work with others.

@ajantha-bhat ajantha-bhat added this to the Iceberg 1.9.0 milestone Mar 17, 2025
@ajantha-bhat
Copy link
Member

@ebyhr: Thanks for pinging on the mailing list. I have added the 1.9.0 milestone for this. If we can get it merged in a week. We can include this.

So, the only blocker is we want to have integration tests? I think running tests at Trino and sharing report is should be enough for this release, as the fix seems to impacting users. But agree that we need a strong framework in Iceberg or as a separate project to validate these before every release.

@steveloughran
Copy link
Contributor

steveloughran commented Mar 17, 2025

if you are really in a hurry to ship, use a 2.29.x SDK version.

@ajantha-bhat
Copy link
Member

I have opened a PR to revert the version in master for 1.9.0 release as this PR didn't made good progress within required time.
#12649

We can still continue work on this. Thanks for all the efforts.

@nastra nastra modified the milestones: Iceberg 1.9.0, Iceberg 1.10.0 Mar 26, 2025
@steveloughran
Copy link
Contributor

FWIW I'm trying to convince the AWS SDK team that third party stores do matter -so the SDK should add an option to be compatible -without downstream libraries having to implement workarounds. Let's see what happens.

The more applications which say "no, will stay at 2.29.52 for now", the more motivation the SDK team should have for this

@stubz151
Copy link
Contributor

stubz151 commented Apr 29, 2025

aws/aws-sdk-java-v2#6055 I see that the SDK team has merged a fix for this. @ebyhr are you going to update this PR? If not I'm happy to own the sdk upgrade.

@ebyhr
Copy link
Contributor Author

ebyhr commented Apr 29, 2025

@stubz151 Yse, I'm going to update this PR :)

bernd added a commit to Graylog2/graylog2-server that referenced this pull request Apr 30, 2025
Newer SDKs break compatibility with third-party S3-compatible services.

See: apache/iceberg#12264
bernd added a commit to Graylog2/graylog2-server that referenced this pull request Apr 30, 2025
Newer SDKs break compatibility with third-party S3-compatible services.

See: apache/iceberg#12264
bernd added a commit to Graylog2/graylog2-server that referenced this pull request Apr 30, 2025
Newer versions break compatibility with S3-compatible services.

See:
- apache/iceberg#12264
- Graylog2/graylog-plugin-enterprise#10504
@ebyhr ebyhr force-pushed the ebi/s3-integrity-check branch from 83185e7 to 5062417 Compare April 30, 2025 12:48
@ebyhr ebyhr force-pushed the ebi/s3-integrity-check branch 2 times, most recently from 9555b8c to 54cf4e5 Compare April 30, 2025 12:52
@ebyhr ebyhr changed the title S3: Disable strong integrity checksums Build, S3: Update awssdk-bom to 2.31.32 + Add LegacyMd5Plugin to S3 client builder Apr 30, 2025
@ebyhr ebyhr force-pushed the ebi/s3-integrity-check branch from 54cf4e5 to 1eb0925 Compare May 5, 2025 06:23
@ebyhr ebyhr changed the title Build, S3: Update awssdk-bom to 2.31.32 + Add LegacyMd5Plugin to S3 client builder S3: Add LegacyMd5Plugin to S3 client builder May 5, 2025
Copy link

github-actions bot commented Jun 5, 2025

This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the [email protected] list. Thank you for your contributions.

@ebyhr ebyhr force-pushed the ebi/s3-integrity-check branch from 1eb0925 to 0740a68 Compare July 2, 2025 20:39
Copy link
Contributor

@stevenzwu stevenzwu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@@ -82,11 +83,15 @@ public class AwsClientProperties implements Serializable {
/** Controls whether vended credentials should be refreshed or not. Defaults to true. */
public static final String REFRESH_CREDENTIALS_ENABLED = "client.refresh-credentials-enabled";

/** Controls whether legacy md5 plugin should be added or not. Defaults to false. */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we might want to improve the docs here and mention what this flag does exactly as it won't be obvious to most people what this flag actually does. I would probably add some description that says the SDK's behavior changed in version X and therefore affects S3-compliant storage implementations since that version and such

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we might also want to include the wording from https://sdk.amazonaws.com/java/api/latest/software/amazon/awssdk/services/s3/LegacyMd5Plugin.html here to make it clear for users

@@ -106,6 +106,7 @@ static class DefaultAwsClientFactory implements AwsClientFactory {
@Override
public S3Client s3() {
return S3Client.builder()
.applyMutation(awsClientProperties::applyLegacyMd5Plugin)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we also need to update DefaultS3FileIOAwsClientFactory

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we do have also other implementations of AwsClientFactory, so we should check those places as well whether this needs to be applied there

@ebyhr ebyhr force-pushed the ebi/s3-integrity-check branch from f9dfaf7 to 5c9bcfe Compare July 4, 2025 01:37
@@ -46,6 +46,7 @@ public void initialize(Map<String, String> properties) {
public S3Client s3() {
return S3Client.builder()
.applyMutation(awsClientProperties::applyClientRegionConfiguration)
.applyMutation(awsClientProperties::applyLegacyMd5Plugin)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should most likely be applied to the s3Async() case below as well

@nastra nastra merged commit fb0af7e into apache:main Jul 4, 2025
42 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.