Skip to content

[read-unfinalized-object] [part1] Relax file-cache checks for unfinalized objects #3401

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

gargnitingoogle
Copy link
Collaborator

@gargnitingoogle gargnitingoogle commented Jun 10, 2025

Description

The change is based on design doc

Changes in behavior for unfinalized objects:

  • Don't invalidate file-cache entry if size of completed download entry >= intended downloaded size
  • Don't fall back to GCS from file-cache read if size of completed download entry >= intended upper limit of offset to be read.
  • Fallback to GCS while reading from cache_handle if requested read isn't available from the cached file-info data size. If partial read is possible from the cache, then go ahead and start the download job.

Additional changes

  • Added a simple utility for easily checking if an object is unfinalized.
  • Enable zonal bucket unit tests for TestFileCacheReader, TestRandomReader and TestReadManager.
  • Add new unit tests corresponding to the above code changes.
  • Update existing unit tests corresponding to the above code changes.

Dependencies

Out of scope:

Link to the issue in case of a bug fix.

b/421294827

Testing details

  1. Manual - Tested manually.
  2. Unit tests - Tested automatically by this PR. Add more UTs for the changes in the code.
  3. Integration tests - ran with presubmit (only zonal, e2e zonal, non-zonal and perf test run)
  4. Perf tests results
+--------+------------+--------------+------------+--------------+--------------+
| Branch | File Size  |   Read BW    |  Write BW  | RandRead BW  | RandWrite BW |
+--------+------------+--------------+------------+--------------+--------------+
| Master |  0.25MiB   | 555.31MiB/s  |  1.2MiB/s  |  77.32MiB/s  |  1.17MiB/s   |
|   PR   |  0.25MiB   | 581.98MiB/s  | 1.21MiB/s  |  77.26MiB/s  |   1.2MiB/s   |
|        |            |              |            |              |              |
|        |            |              |            |              |              |
| Master | 48.828MiB  | 3900.45MiB/s | 85.61MiB/s | 1554.19MiB/s |  84.39MiB/s  |
|   PR   | 48.828MiB  | 3849.86MiB/s | 85.42MiB/s | 1499.03MiB/s |  84.85MiB/s  |
|        |            |              |            |              |              |
|        |            |              |            |              |              |
| Master | 976.562MiB | 3742.3MiB/s  | 79.22MiB/s | 763.35MiB/s  |  39.24MiB/s  |
|   PR   | 976.562MiB | 3711.34MiB/s | 77.93MiB/s | 1296.53MiB/s |  38.71MiB/s  |
|        |            |              |            |              |              |
|        |            |              |            |              |              |
+--------+------------+--------------+------------+--------------+--------------+

Any backward incompatible change? If so, please explain.

@gargnitingoogle gargnitingoogle changed the base branch from master to fetch-unfinalized-obj-attr-using-0byte-reader June 10, 2025 11:39
@gargnitingoogle gargnitingoogle force-pushed the gargnitin/support-unfinalized-read/read-cached/v1 branch 2 times, most recently from 05ecb33 to a8c23a7 Compare June 10, 2025 11:43
@gargnitingoogle gargnitingoogle added the execute-integration-tests-on-zb To run E2E tests on zonal bucket. label Jun 10, 2025
@gargnitingoogle gargnitingoogle force-pushed the gargnitin/support-unfinalized-read/read-cached/v1 branch from a8c23a7 to 53c22bc Compare June 10, 2025 11:47
@gargnitingoogle gargnitingoogle changed the title Gargnitin/support unfinalized read/read cached/v1 [read-unfinalized-object] Fall back to gcs-read if unfinalzed-object read from cached-file fails Jun 10, 2025
@gargnitingoogle gargnitingoogle changed the title [read-unfinalized-object] Fall back to gcs-read if unfinalzed-object read from cached-file fails [read-unfinalized-object] Fall back to gcs-read if unfinalized-object read from cached-file fails Jun 10, 2025
@anushka567 anushka567 force-pushed the fetch-unfinalized-obj-attr-using-0byte-reader branch 2 times, most recently from bed6248 to 082fa79 Compare June 10, 2025 16:30
@gargnitingoogle gargnitingoogle force-pushed the gargnitin/support-unfinalized-read/read-cached/v1 branch from 53c22bc to 14448e8 Compare June 11, 2025 04:21
@gargnitingoogle gargnitingoogle changed the base branch from fetch-unfinalized-obj-attr-using-0byte-reader to master June 11, 2025 04:29
@gargnitingoogle gargnitingoogle force-pushed the gargnitin/support-unfinalized-read/read-cached/v1 branch 3 times, most recently from 104ec71 to 9deb5ea Compare June 11, 2025 11:56
Copy link

codecov bot commented Jun 11, 2025

Codecov Report

Attention: Patch coverage is 84.00000% with 4 lines in your changes missing coverage. Please review.

Project coverage is 79.00%. Comparing base (94b76ce) to head (45305ac).
Report is 1 commits behind head on master.

Files with missing lines Patch % Lines
internal/cache/file/cache_handle.go 81.81% 3 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #3401      +/-   ##
==========================================
- Coverage   79.07%   79.00%   -0.07%     
==========================================
  Files         140      140              
  Lines       18520    18539      +19     
==========================================
+ Hits        14644    14647       +3     
- Misses       3392     3404      +12     
- Partials      484      488       +4     
Flag Coverage Δ
unittests 79.00% <84.00%> (-0.07%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@gargnitingoogle gargnitingoogle force-pushed the gargnitin/support-unfinalized-read/read-cached/v1 branch from 9deb5ea to 170a49a Compare June 11, 2025 18:08
@gargnitingoogle gargnitingoogle marked this pull request as ready for review June 11, 2025 19:43
@gargnitingoogle gargnitingoogle requested a review from a team as a code owner June 11, 2025 19:43
@gargnitingoogle gargnitingoogle force-pushed the gargnitin/support-unfinalized-read/read-cached/v1 branch 2 times, most recently from 4af9051 to d20e570 Compare June 12, 2025 11:02
@gargnitingoogle gargnitingoogle removed the execute-integration-tests-on-zb To run E2E tests on zonal bucket. label Jun 12, 2025
@gargnitingoogle gargnitingoogle changed the base branch from master to fetch-unfinalized-obj-attr-using-0byte-reader June 12, 2025 11:14
@gargnitingoogle gargnitingoogle changed the base branch from fetch-unfinalized-obj-attr-using-0byte-reader to master June 12, 2025 11:19
@gargnitingoogle gargnitingoogle changed the base branch from master to fetch-unfinalized-obj-attr-using-0byte-reader June 12, 2025 11:53
@gargnitingoogle gargnitingoogle requested review from abhishek10004 and anushka567 and removed request for ashmeenkaur June 12, 2025 11:54
@gargnitingoogle gargnitingoogle changed the base branch from fetch-unfinalized-obj-attr-using-0byte-reader to master June 12, 2025 13:41
@gargnitingoogle gargnitingoogle force-pushed the gargnitin/support-unfinalized-read/read-cached/v1 branch from 0d586cf to 45305ac Compare July 1, 2025 09:35
@gargnitingoogle gargnitingoogle merged commit 299c67a into master Jul 1, 2025
13 checks passed
@gargnitingoogle gargnitingoogle deleted the gargnitin/support-unfinalized-read/read-cached/v1 branch July 1, 2025 11:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
remind-reviewers Auto remind reviewers in attention set for review post 24hrs of inactivity on PR.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants