-
Notifications
You must be signed in to change notification settings - Fork 786
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support native S3 conditional writes #6682
base: master
Are you sure you want to change the base?
Support native S3 conditional writes #6682
Conversation
@tustvold are you the right person to review this? I saw you implemented quite a bit of the previous conditional put/get support. |
This isn't ideal, but it seemed best for the short term to avoid breaking backcompat for anyone who might be using a version of S3 that doesn't support CAS. In the long term, I think the ideal would definitely be to have the native S3 CAS support enabled by default (i.e. unless overridden explicitly by the user). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, this looks good to me, mostly just some minor nits.
I agree with the decision to leave this off by default for a couple of reasons:
- S3 compatible stores support these mechanisms with fewer drawbacks (e.g. full conditional update, native copy_if_not_exists, etc...) and users should prefer these
- AWS may eventually bring S3 into feature parity with literally every other object store, at which point this will provide a coherent API to converge on
@@ -651,6 +651,12 @@ pub async fn put_opts(storage: &dyn ObjectStore, supports_update: bool) { | |||
assert_eq!(b.as_ref(), b"a"); | |||
|
|||
if !supports_update { | |||
let err = storage |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
object_store/src/aws/precondition.rs
Outdated
/// Encoded as `etag-create-only` ignoring whitespace. | ||
/// | ||
/// [announcement]: https://aws.amazon.com/about-aws/whats-new/2024/08/amazon-s3-conditional-writes/ | ||
ETagCreateOnly, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ETagCreateOnly, | |
ETagPutIfNotExists, |
Maybe? I don't feel strongly though
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, works for me.
object_store/src/aws/precondition.rs
Outdated
/// WARNING: When using this mode, `copy_if_not_exists` does not copy | ||
/// tags or attributes from the source object. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
I had a brief scan and couldn't see a way around this
object_store/src/aws/client.rs
Outdated
} | ||
} | ||
} | ||
|
||
pub(crate) enum PutPartPayload<'a> { | ||
Inline(PutPayload), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Inline(PutPayload), | |
Part(PutPayload), |
Maybe?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, no strong feels from me on the name here!
object_store/src/aws/client.rs
Outdated
PutPartPayload::Inline(payload) => request.with_payload(payload), | ||
PutPartPayload::Copy(path) => request.header( | ||
"x-amz-copy-source", | ||
&format!("{}/{}", self.config.bucket, path), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to use encode_path here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yeah, seems like it, thanks!
object_store/src/aws/mod.rs
Outdated
) | ||
.await | ||
{ | ||
Err(e @ Error::Precondition { .. }) => Err(Error::AlreadyExists { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to abort the multipart upload in this case? Does this happen automatically?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oof, good call. It does not happen automatically. I added code to do a best effort multipart upload, and a docs note that this isn't guaranteed and that the user should configure automatic multipart expiration via a lifecycle rule.
/// | ||
/// WARNING: When using this mode, `copy_if_not_exists` does not copy | ||
/// tags or attributes from the source object. | ||
/// |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could possibly also link to the module docs around lifecycle rules for cleaning up incomplete multipart uploads - https://docs.rs/object_store/latest/object_store/aws/index.html#multipart-uploads
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good call, done!
CompleteMultipartRequest { source: crate::client::retry::Error }, | ||
CompleteMultipartRequest { | ||
source: crate::client::retry::Error, | ||
path: String, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should the path also be in the displayed error?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good call, done.
Add support for `PutMode::Create` and `copy_if_not_exists` on native AWS S3, which uses the underlying conditional write primitive that Amazon launched earlier this year [0]. The conditional write primitive is simpler than what's available in other S3-like products (e.g., R2), so new modes for `s3_copy_if_not_exists` and `s3_conditional_put` are added to select the native S3-specific behavior. To maintain strict backwards compatibility (e.g. with older versions of LocalStack), the new behavior is not on by default. It must be explicitly requested by the end user. The implementation for `PutMode::Create` is straightforward. The implementation of `copy_if_not_exists` is a bit more involved, as it requires managing a multipart upload that uses the UploadPartCopy operation, which was not previously supported by this crate's S3 client. To ensure test coverage, the object store workflow now runs the AWS integration tests with conditional put both disabled and enabled. Fix apache#6285. [0]: https://aws.amazon.com/about-aws/whats-new/2024/08/amazon-s3-conditional-writes/
dc2a634
to
075c524
Compare
075c524
to
9cf76bc
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks very much for the quick review, @tustvold! Addressed all your feedback, I believe.
CompleteMultipartRequest { source: crate::client::retry::Error }, | ||
CompleteMultipartRequest { | ||
source: crate::client::retry::Error, | ||
path: String, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good call, done.
object_store/src/aws/client.rs
Outdated
} | ||
} | ||
} | ||
|
||
pub(crate) enum PutPartPayload<'a> { | ||
Inline(PutPayload), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, no strong feels from me on the name here!
object_store/src/aws/client.rs
Outdated
PutPartPayload::Inline(payload) => request.with_payload(payload), | ||
PutPartPayload::Copy(path) => request.header( | ||
"x-amz-copy-source", | ||
&format!("{}/{}", self.config.bucket, path), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yeah, seems like it, thanks!
/// | ||
/// WARNING: When using this mode, `copy_if_not_exists` does not copy | ||
/// tags or attributes from the source object. | ||
/// |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good call, done!
object_store/src/aws/precondition.rs
Outdated
/// Encoded as `etag-create-only` ignoring whitespace. | ||
/// | ||
/// [announcement]: https://aws.amazon.com/about-aws/whats-new/2024/08/amazon-s3-conditional-writes/ | ||
ETagCreateOnly, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, works for me.
object_store/src/aws/mod.rs
Outdated
) | ||
.await | ||
{ | ||
Err(e @ Error::Precondition { .. }) => Err(Error::AlreadyExists { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oof, good call. It does not happen automatically. I added code to do a best effort multipart upload, and a docs note that this isn't guaranteed and that the user should configure automatic multipart expiration via a lifecycle rule.
To a version that supports conditional writes.
@tustvold if you can give me another re-run here I think we’ll be all green on CI. I fixed a minor clippy failure and upgraded to a version of localstack that supports conditional put. |
Whew, all tests here are green! @tustvold let me know if there's anything else here you'd like to see, but from my perspective this is ready to (squash) merge. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a small nit, then I think we're good to go. Thank you for this
let part_id = self | ||
.client | ||
.put_part(to, &upload_id, 0, PutPartPayload::Copy(from)) | ||
.await?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this error should also trigger cleanup, might be worth encapsulating the steps into a function
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does, doesn't it? If anything in this async
block (L306-329) returns an error, the cleanup on L336 should fire.
Add support for
PutMode::Create
andcopy_if_not_exists
on native AWS S3, which uses the underlying conditional write primitive that Amazon launched earlier this year 0.The conditional write primitive is simpler than what's available in other S3-like products (e.g., R2), so new modes for
s3_copy_if_not_exists
ands3_conditional_put
are added to select the native S3-specific behavior.To maintain strict backwards compatibility (e.g. with older versions of LocalStack), the new behavior is not on by default. It must be explicitly requested by the end user.
The implementation for
PutMode::Create
is straightforward. The implementation ofcopy_if_not_exists
is a bit more involved, as it requires managing a multipart upload that uses the UploadPartCopy operation, which was not previously supported by this crate's S3 client.To ensure test coverage, the object store workflow now runs the AWS integration tests with conditional put both disabled and enabled.
Which issue does this PR close?
Fix #6285.