-
Notifications
You must be signed in to change notification settings - Fork 10
Description
Quote from PR #416 to keep track of:
Just as a comment either to implement in this PR or for a future improvement, this logic loads up all chunks into memory before streaming them to the remote location, which for a big enough file could be impossible (and overall is not really efficient). My idea for an optimization would look something like this:
// Create a stream that yields chunk indices, then map each to a chunk fetch
let chunks_stream = futures::stream::iter(0..total_chunks)
.map(|chunk_idx| ChunkId::new(chunk_idx))
.then(move |chunk_id| {
let file_storage = self.file_storage.clone();
async move {
// Short lock duration per chunk when fetching
let read_storage = file_storage.read().await;
let result = read_storage.get_chunk(&file_key, &chunk_id);
drop(read_storage);
result.map_err(into_rpc_error)
}
});
This type of logic would make it so we read the chunks from storage as needed. It has its issues though: It is way more inefficient regarding the file storage lock, even if it releases it frequently enough to prevent any key operations from being completely blocked.
I think what we should do is to strive for the middle ground: using this type of read streaming to avoid overloading the memory but doing it with batch of chunks instead of per chunk. This would greatly improve the issue with the file storage lock while avoiding any memory issues
Originally posted by @TDemeco in #416 (comment)