Fix chunk cache key and metadata retrieval #104

maxstack · 2025-04-17T08:41:13Z

Fixes for the chunk cache:

Request offset and size not included in the cache key so the different byte ranges received by these requests would be cached in the same file causing errors in results
Fix thread issue where tasks processing requests could access the cache state whilst being written via get_metadata
Take unnecessary request parameters out of the key so the cache may hit more, we only include the source, bucket, object key, offset, and size - these are the parameters used by S3 object download API calls

…nloaded byte range under a key that doesn't account for this range. When another request is received we get a cache hit even though the requested byte range may differ. Fix this by downloading and caching the entire chunk, applying the byte range post download or cache hit. If caching is disabled honour the byte range in the S3 client download.

…ache chunks aren't shared between requests of differing byte range. Revert the previous fix which suffers from memory over consumption when bombarded by requests for byte ranges from the same very large file - we would need a way to ensure a single download of that file before servicing concurrent requests against it.

1) Only incorporate request fields that themselves are used in the S3 object download 2) Immediately turn this key into a md5 hash so we're handling a much shorter string when interacting with the cache

…essing task to determine if a chunk is cached and if so its size. There's a MPSC channel used to buffer all cache write requests to a single task responsible for cache updates but the update task could happen at the same time request processing tasks read the state. Reading the state file whilst it's being written results in a serde parsing error. Instead of adding some type of thread safety around the load/save of the state file a simpler solution is to store a chunk's metadata in its own metadata file. This mirrors the chunk get which ignores the state file and simply retrieves files from disk.

sd109

Nice, seems like a sensible approach to me.

Do we also need to be worried about the pruning cycle trying to access the cache state file while the cache is being written to? I see we're using load_state in the remove method as well as in various places throughout the prune* methods.

src/app.rs

src/chunk_cache.rs

…chunks independently, two users requesting the same chunk will cache it independently with no sharing involved - this is the fastest way to enable authentication by default

maxstack added 4 commits April 10, 2025 22:18

Alter the key used for the chunk cache:

57df55d

1) Only incorporate request fields that themselves are used in the S3 object download 2) Immediately turn this key into a md5 hash so we're handling a much shorter string when interacting with the cache

maxstack self-assigned this Apr 17, 2025

sd109 suggested changes Apr 17, 2025

View reviewed changes

src/app.rs Show resolved Hide resolved

src/chunk_cache.rs Outdated Show resolved Hide resolved

src/chunk_cache.rs Show resolved Hide resolved

maxstack added 3 commits April 17, 2025 14:37

Allow the key of the chunk cache to be configurable.

cce5926

Mirror the default chunk cache key in the ansible group_vars.

a4af1cb

Replace unwraps with descriptive error messages.

de30640

maxstack force-pushed the fix/chunk-cache-offset-and-size branch from 2638d03 to eed4a06 Compare January 13, 2026 18:03

Log errors instead of unwrap or expect potentially discarding them

64a4e24

maxstack force-pushed the fix/chunk-cache-offset-and-size branch from eed4a06 to 64a4e24 Compare January 14, 2026 10:32

maxstack added 3 commits January 14, 2026 11:11

Remove ansible-lint version

9e45101

Panic if the chunk cache directory cannot be initialised as expected

4cd50cc

Set default chunk cache authentication such that clients cache their …

7f7cc45

…chunks independently, two users requesting the same chunk will cache it independently with no sharing involved - this is the fastest way to enable authentication by default

maxstack marked this pull request as ready for review January 14, 2026 13:48

maxstack merged commit 3be550b into main Jan 14, 2026
8 checks passed

maxstack deleted the fix/chunk-cache-offset-and-size branch January 14, 2026 13:54

maxstack changed the title ~~Draft Fix chunk cache key and metadata retrieval~~ Fix chunk cache key and metadata retrieval Jan 15, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix chunk cache key and metadata retrieval #104

Fix chunk cache key and metadata retrieval #104

Uh oh!

maxstack commented Apr 17, 2025

Uh oh!

sd109 left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Fix chunk cache key and metadata retrieval #104

Fix chunk cache key and metadata retrieval #104

Uh oh!

Conversation

maxstack commented Apr 17, 2025

Uh oh!

sd109 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants