-
Notifications
You must be signed in to change notification settings - Fork 151
Use NextMarker instead of tail() to find NextMarker #414
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Previously, function erroneously assumed that XML response always had <Content> elements listed last, and merely used `tail()` to get the Key of the last element for use as NextMarker. This assumption relying on ordering rather than explicitly using element names fails when clients (CEPH S3 interface) return XML that lists metadata fields. Moreover, the S3 response actually provides a field called NextMarker for this very purpose, which ought to be used instead of attempting to find the last key. Later, the code again used `tail` erroneously depending on order, rather than merely copying over the NextMarker metadata field onto the expanding concatanated list.
@s-u any chance this PR can be merged please? The problem still exists as described and @cboettig's solution (this PR) works on the data sets I am using from Copernicus Marine. Without this PR, |
@raymondben Sure - but it would help if 1) there was some information on what the problem is (the above talks about the code but not actually how to trigger the problem/where this matters) and how it can be replicated and 2) confirm that the PR works and doesn't break anything else? It seems that you may have the material for both. |
Thanks @s-u - fair points, and yes I can go some way towards addressing those. Thanks, stand by ... |
Here is a reproducible example:
Instead of listing the first 2000 objects, we are getting two copies of the first 1000 objects. This happens because the current But in this example, that
Instead we can use the
In our above example:
Which solves the problem in that case. But (and see the note in the doc snippet above) the
But the existing aws.s3::get_bucket code works in this case:
So I would suggest modifying this PR slightly, so that the existing https://github.com/snystrom/aws.s3/blob/master/R/get_bucket.R#L56 is changed from
to
(i.e. retain the existing behaviour when it works, otherwise use |
@raymondben many thanks for providing the examples! The core issue was that |
There is one more issue: Line 59 in 3e635ab
NextToken case - will look at it tomorrow
|
That one should be fixed now as well. |
Champion! Thanks. |
Previously, function erroneously assumed that XML response always had
<Content>
elements listed last, and merely usedtail()
to get the Key of the last element for use as NextMarker.This assumption relying on ordering rather than explicitly using element names fails when clients (Redhat's CEPH S3 interface, an open source product widely used in research data centers) return XML that lists metadata fields. Moreover, the S3 response actually provides a field called NextMarker for this very purpose, which ought to be used instead of attempting to find the last key. My pull request simply updates the code to use
r$NextMarker
to find the NextMarker instead of usingtail(r, 1)[["Contents"]][["Key"]])
Please ensure the following before submitting a PR:
/R
not/man
and rundevtools::document()
to update documentation/tests
for any new functionality or bug fixR CMD check
runs without error before submitting the PR