Skip to content

[Storage] Managed download perf: runtime task per-chunk & separate download work queues#3950

Open
jaschrep-msft wants to merge 17 commits intoAzure:mainfrom
jaschrep-msft:separate-download-work-queues
Open

[Storage] Managed download perf: runtime task per-chunk & separate download work queues#3950
jaschrep-msft wants to merge 17 commits intoAzure:mainfrom
jaschrep-msft:separate-download-work-queues

Conversation

@jaschrep-msft
Copy link
Member

Major performance improvements for managed download.

  • Uses Core's runtime abstraction to spawn workers for each individual chunk of the download.
    • If download is a one-shot, does not spawn any workers. Runs async on whatever task the caller spawned the download on.
  • Separates the work of downloading chunks from resequencing chunks to return in the overall download stream.
    • Active chunk downloads capped by existing parallel bound.
    • Buffers waiting to be re-sequenced capped at 2x parallel bound.
  • Completed chunks stored in ring buffer waiting to be returned in overall stream.
  • Download tasks tagged with index for resequencing, allowing them to be placed in the correct position in the ring buffer.

Credit to @nateprewitt's initial implementation I ported over to use the tools available in our dependency chain.

@github-actions github-actions bot added the Storage Storage Service (Queues, Blobs, Files) label Mar 13, 2026
@jaschrep-msft
Copy link
Member Author

jaschrep-msft commented Mar 13, 2026

Resolved offline.

@heaths & @LarryOsterman, major question before marking this ready.

This implementation currently panics out-of-box. To successfully complete, the caller must use the tokio runtime to call this download and the app must have the tokio feature flag enabled for azure_core. This is due to reqwest's tight-binding to tokio; it also spawns tasks directly from tokio. If Core's get_async_runtime() does not return a tokio implementation, reqwest/hyper will panic. If the caller to download isn't already in a tokio runtime, there will be a panic either by us or reqwest/hyper, depending on feature flags.

We need the ability to spawn tasks to achieve our target speed, and it seems spawning tasks that contain network calls require tokio all the way down.

How do we handle this?

I'm not well-experienced in the ecosystem, but it seems the way to deal with this is to introduce a feature flag for tokio the storage sdk that will enable the code in this PR, and then the absence of this flag will fall back to the previous implementation. This gets us our target perf for tokio users (most users). Other runtimes could possibly get their own flags as needed in the future.

Are there other preferred mechanisms? Can we tie storage's tokio flag to core's tokio flag? What do we think is the path forward here?

@jaschrep-msft jaschrep-msft force-pushed the separate-download-work-queues branch 2 times, most recently from 8c2302b to fa61608 Compare March 17, 2026 15:24
@jaschrep-msft jaschrep-msft marked this pull request as ready for review March 17, 2026 16:34
Copilot AI review requested due to automatic review settings March 17, 2026 16:34
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refactors the blob managed download implementation to improve throughput by spawning per-chunk download tasks via azure_core’s runtime abstraction and by separating chunk downloading from resequencing/buffering in the returned stream.

Changes:

  • Reworked partitioned_transfer::download() to schedule chunk downloads as spawned tasks and resequence outputs via an indexed ring buffer.
  • Added helper utilities for collecting streamed bytes into a pre-allocated buffer and for handling invalid initial range requests.
  • Added async-stream as a dependency to implement the new streaming logic.

Reviewed changes

Copilot reviewed 2 out of 3 changed files in this pull request and generated 4 comments.

File Description
sdk/storage/azure_storage_blob/src/partitioned_transfer/download.rs Major download pipeline refactor: per-chunk task spawning, bounded resequencing buffer, new helpers.
sdk/storage/azure_storage_blob/Cargo.toml Adds async-stream dependency for the new try_stream! implementation.
Cargo.lock Locks the new async-stream dependency.
Comments suppressed due to low confidence (1)

sdk/storage/azure_storage_blob/src/partitioned_transfer/download.rs:28

  • This module relies on use super::*; to bring in key names like future/TryStreamExt (used by future::select_all and try_next) rather than importing them locally. That hidden coupling makes the file fragile if partitioned_transfer::mod.rs changes. Consider adding explicit imports here for the items you use to keep the module self-contained.
use futures::TryStream;

use crate::models::http_ranges::ContentRange;

use super::*;

You can also share your feedback on Copilot code review. Take the survey.

@demoray demoray closed this Mar 17, 2026
@demoray demoray reopened this Mar 17, 2026
@demoray
Copy link
Contributor

demoray commented Mar 17, 2026

Sorry, I clicked the wrong button here.

Copy link
Member

@LarryOsterman LarryOsterman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My biggest concern is the lack of documentation explaining the algorithm for download - this is a complicated bit of code and it was challenging to understand it.

let dst = BytesMut::with_capacity(range.len());
let response = client.transfer_range(Some(range)).await?;
response.into_body().collect().await
collect_into(response.into_body(), dst).await
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When the core collect_into PR completes, this can be replaced with response.into_body().collect_into(dst).await, I believe.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That PR is in, I believe this can be replaced with:

response.into_body().collect_into(dst).await?

One minor complication is that the collect_into call can fail if the provided buffer isn't sufficient to hold received chunks. It will fill up to the buffer, and return the actual amount of data received (if the stream ends before the buffer is filled).

If it cannot be replaced, let me know how I can fix the collect_into function to better meet your needs.

@jaschrep-msft
Copy link
Member Author

@LarryOsterman yeah after some of the generated review comments require even more logic to be added, i have been factoring out several parts of this code to simplify the actual contents of the loop.

Copy link
Member

@LarryOsterman LarryOsterman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a significant improvement, thank you very much.

Creating a new base branch to isolate independent changes to download behavior and easily explore their performance differences.

Some changes have been made because they are already known to be universally good, but they have not yet been merged into main.
- pre-size chunk destination buffer. AsyncResponseBody::collect() is not smart enough to do this.
- handle ranged get on empty blob. We gotta do this at some point no matter what, may as well get accurate perf readings with that additional work.
- separate functions for analyzing response headers. Moves some bulky checks out of the way of the real download logic. Also good code reuse for alternate download implementations which may be necessary.
Tokio is now a default feature flag of core. Remove the manual specification. Doubles as validating our our out-of-box experience.
@jaschrep-msft jaschrep-msft force-pushed the separate-download-work-queues branch from 5919fe4 to a5fa059 Compare March 18, 2026 20:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Storage Storage Service (Queues, Blobs, Files)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants