-
Notifications
You must be signed in to change notification settings - Fork 211
Update prefetcher to increment client read window linearly with each read #1546
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft
mansi153
wants to merge
2
commits into
awslabs:main
Choose a base branch
from
mansi153:pr-linear-readsize
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Signed-off-by: Daniel Carl Jones <[email protected]>
…inearly with reads Signed-off-by: Daniel Carl Jones <[email protected]>
mansi153
commented
Jul 29, 2025
@@ -5,6 +5,7 @@ | |||
* Adopt a unified memory pool to reduce overall memory usage. ([#1511](https://github.com/awslabs/mountpoint-s3/pull/1511)) | |||
* Replace `S3Uri` with `S3Path` and consolidate related types like `Bucket` and `Prefix` into the `s3` module. | |||
([#1535](https://github.com/awslabs/mountpoint-s3/pull/1535)) | |||
* `PrefetchGetObject` now has an updated backpressure algorithm advancing the read window with each call to `PrefetchGetObject::read`, with the aim of higher sequential-read throughput. ([#1453](https://github.com/awslabs/mountpoint-s3/pull/1453)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll fix this and add the appropriate changelog entry and version updates
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Rebased version of #1453, PRed to run CI benchmarks and analyse performance/behaviour
Mountpoint's S3 client has a backpressure mechanism to controlling how much data to fetch. This change updates the way Mountpoint's prefetcher signals to Mountpoint's S3 client (using the AWS CRT internally) to fetch more data ahead of where a consuming application is reading to accelerate throughput.
Before this change, Mountpoint would wait for 50% of the existing window to be consumed. For example with a window of 2GiB, 1GiB must be read by the kernel before Mountpoint would inform the S3 client to fetch more data up to 2GiB ahead of the current position.
After this change, Mountpoint now sends this signal with every read by the kernel. For example, a 128KiB read by the Kernel to fill a page in the page cache will result in the CRT being updated to be 2GiB ahead of the end offset of the 128KiB read, where the readahead window has a size of 2GiB.
We observe improved throughput for single file handle sequential reading with this approach. For random reading and for multiple file handle reading, we don't see an observable change in throughput. We expect this may be a prerequisite for driving higher throughput with multiple file handles, with this potentially being one bottleneck among others.
Does this change impact existing behavior?
This improves the way Mountpoint signals progress to its S3 client. We expect improvements in throughput, but the end-user behavior hasn't changed in a meaningful way.
Does this change need a changelog entry? Does it require a version change?
A changelog entry has been added to note the change in algorithm, alongside ensuring a minor version bump. This is added to communicate the change, in case any issue is raised around the change in behavior. However, we do not expect any regressions given the benchmarking performed.
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license and I agree to the terms of the Developer Certificate of Origin (DCO).