Skip to content

Improve chunk streaming #429

@TDemeco

Description

@TDemeco

Quote from PR #416 to keep track of:
Just as a comment either to implement in this PR or for a future improvement, this logic loads up all chunks into memory before streaming them to the remote location, which for a big enough file could be impossible (and overall is not really efficient). My idea for an optimization would look something like this:

// Create a stream that yields chunk indices, then map each to a chunk fetch
let chunks_stream = futures::stream::iter(0..total_chunks)
    .map(|chunk_idx| ChunkId::new(chunk_idx))
    .then(move |chunk_id| {
        let file_storage = self.file_storage.clone();
        async move {
            // Short lock duration per chunk when fetching
            let read_storage = file_storage.read().await;
            let result = read_storage.get_chunk(&file_key, &chunk_id);
            drop(read_storage);
            result.map_err(into_rpc_error)
        }
    });

This type of logic would make it so we read the chunks from storage as needed. It has its issues though: It is way more inefficient regarding the file storage lock, even if it releases it frequently enough to prevent any key operations from being completely blocked.
I think what we should do is to strive for the middle ground: using this type of read streaming to avoid overloading the memory but doing it with batch of chunks instead of per chunk. This would greatly improve the issue with the file storage lock while avoiding any memory issues

Originally posted by @TDemeco in #416 (comment)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions