feat(parallel random reads): Changes to enable parallel random read handling in gcs reader #3619

abhishek10004 · 2025-07-31T19:15:28Z

Description

This pull request introduces changes to enable parallel random reads in the GCS reader. This is achieved by refactoring the read logic in gcs_reader.go, introducing a mutex to manage concurrent access for range reader while allowing reads to go in parallel for MRD. The changes also include updates to the test suite to cover the new logic and parallel read scenarios.

Link to the issue in case of a bug fix.

b/427137532

Testing details

Manual - Done
Unit tests - Done
Integration tests - NA

Any backward incompatible change? If so, please explain.

…abhishek/random_reads_gcs

gemini-code-assist · 2025-08-01T11:23:10Z

Hi @abhishek10004, it looks like you've invoked /gemini without a specific command. If you'd like me to perform a code review, please use /gemini review. If you'd like a summary of the pull request, please use /gemini summary. You can also type /gemini help to see a list of all available commands.

abhishek10004 · 2025-08-01T11:27:14Z

/gemini review
/gemini summary

gemini-code-assist

Code Review

This pull request introduces changes to enable parallel random reads in the GCS reader. This is achieved by refactoring the read logic in gcs_reader.go, introducing a mutex to manage concurrent access, and adding logic to dynamically switch between sequential (RangeReader) and random (MultiRangeReader) read strategies. The changes also include significant updates to the test suite to cover the new logic and parallel read scenarios.

While the changes to gcs_reader.go seem well-thought-out for concurrency, a critical issue has been introduced in internal/fs/fs.go by removing a file handle lock, which leads to a data race. Additionally, some of the test modifications appear fragile due to hardcoded mock expectations.

gemini-code-assist · 2025-08-01T11:29:10Z

internal/fs/fs.go

 	fh.Inode().Lock()
-	defer fh.Unlock()


Removing the lock on the file handle fh here introduces a data race. Multiple goroutines calling ReadFile on the same file handle will concurrently access and potentially modify fh.reader and fh.readManager within fh.Read and fh.ReadWithReadManager. This can lead to unpredictable behavior and crashes.

While the goal is to enable parallel reads, the file handle's internal state (like the reader instance) must be protected during initialization and updates. A lock is still required for this. Consider moving the lock inside fh.Read and fh.ReadWithReadManager to protect just the reader management logic, which would allow the actual read I/O operations to run in parallel if the underlying reader supports it.

This is okay since we are now synchronizing at the reader level (for range readers)

internal/gcsx/client_readers/gcs_reader_test.go

internal/gcsx/read_manager/read_manager_test.go

codecov · 2025-08-01T11:43:15Z

Codecov Report

❌ Patch coverage is 95.23810% with 5 lines in your changes missing coverage. Please review.
✅ Project coverage is 82.36%. Comparing base (c2a7d57) to head (aa1f2ad).
⚠️ Report is 7 commits behind head on master.

Files with missing lines	Patch %	Lines
internal/gcsx/client_readers/gcs_reader.go	93.50%	4 Missing and 1 partial ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #3619      +/-   ##
==========================================
+ Coverage   82.34%   82.36%   +0.02%     
==========================================
  Files         146      146              
  Lines       22524    22663     +139     
==========================================
+ Hits        18548    18667     +119     
- Misses       3428     3445      +17     
- Partials      548      551       +3

Flag	Coverage Δ
unittests	`82.36% <95.23%> (+0.02%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

github-actions · 2025-08-01T12:31:43Z

/gemini review

github-actions · 2025-08-01T12:31:44Z

/gemini summary

internal/gcsx/client_readers/gcs_reader.go

vadlakondaswetha · 2025-08-04T10:55:52Z

internal/gcsx/client_readers/range_reader.go

-func (rr *RangeReader) readFromExistingReader(ctx context.Context, req *gcsx.GCSReaderRequest) (gcsx.ReaderResponse, error) {
+func (rr *RangeReader) readFromExistingReader(ctx context.Context, req *gcsx.GCSReaderRequest) (int, error) {
+	rr.skipBytes(req.Offset)
+	rr.invalidateReaderIfMisalignedOrTooSmall(req.Offset, req.Offset+int64(len(req.Buffer)))


please compute and put the value in a variable so it can be reused. req.Offset+int64(len(req.Buffer))

also what is the difference between endOffset in GCSReaderRequest vs this one?

GCSReaderRequest has the computed endOffset which would be used while creating the reader. Here, we're reading from the existing reader, so I'm reading only what the user has requested.

internal/gcsx/client_readers/gcs_reader.go

vadlakondaswetha · 2025-08-04T10:58:29Z

internal/gcsx/client_readers/gcs_reader_test.go

@@ -407,129 +410,341 @@ func (t *gcsReaderTest) Test_ReadAt_PropagatesCancellation() {
 	}
 }

-func (t *gcsReaderTest) Test_ReadInfo_WithInvalidInput() {
-	t.object.Size = 10 * MiB
+func (t *gcsReaderTest) Test_IsSeekNeeded() {


not reviewing the tests assuming you have written the same tests as random_reader

internal/gcsx/client_readers/gcs_reader.go

internal/gcsx/client_readers/range_reader.go

abhishek10004 · 2025-08-04T16:32:58Z

/gemini review

gemini-code-assist · 2025-08-04T16:42:07Z

Warning

Gemini encountered an error creating the review. You can try again by commenting /gemini review.

raj-prince · 2025-08-04T17:29:45Z

internal/gcsx/client_readers/gcs_reader.go

@@ -52,6 +52,12 @@ const (
 	MultiRangeReaderType
 )

+type readInfo struct {


Should we rename it to more specific and clear? Something like - RequestState?

Also, I feel good to document the purpose of this struct, as not very straightforward to understand.

internal/gcsx/client_readers/gcs_reader.go

abhishek10004 added 2 commits July 30, 2025 04:34

Changes to allow parallel random reads from same file handle

63a9758

Adding endOffset logic and UTs

c7023b2

abhishek10004 added execute-perf-test Execute performance test in PR execute-integration-tests Run only integration tests labels Jul 31, 2025

abhishek10004 added 3 commits August 1, 2025 11:12

Adding UT

2ec4e9c

Merge branch 'master' of github.com:GoogleCloudPlatform/gcsfuse into …

16e94fe

…abhishek/random_reads_gcs

Fixing test build

83a831a

gemini-code-assist bot reviewed Aug 1, 2025

View reviewed changes

Removing commented code

4478da8

abhishek10004 marked this pull request as ready for review August 1, 2025 12:31

abhishek10004 requested a review from a team as a code owner August 1, 2025 12:31

abhishek10004 requested a review from charith87 August 1, 2025 12:31

github-actions bot added the remind-reviewers Auto remind reviewers in attention set for review post 24hrs of inactivity on PR. label Aug 1, 2025

abhishek10004 requested review from raj-prince, Tulsishah and vadlakondaswetha August 1, 2025 12:31

kislaykishore requested review from a team and Tulsishah and removed request for raj-prince, Tulsishah, vadlakondaswetha and a team August 1, 2025 12:32

Changes in multi_range_reader

7df5586

abhishek10004 requested review from raj-prince and vadlakondaswetha August 3, 2025 19:05

vadlakondaswetha reviewed Aug 4, 2025

View reviewed changes

internal/gcsx/client_readers/gcs_reader.go Show resolved Hide resolved

vadlakondaswetha reviewed Aug 4, 2025

View reviewed changes