credentials/alts: Optimize reads #8204

arjan-bal · 2025-03-28T03:46:00Z

Background

Internal issue: b/400312873

While benchmarking Google Cloud Storage read performance, the flame graphs showed significant time being spent in google.golang.org/grpc/credentials/alts/internal/conn.(*conn).Read. These changes are done to reduce the time spent in this function. The following optimizations are done:

In the handshaker, allocate a buffer only once, outside the for loop.
Use 32KB (previously 4KB) buffers to read from the network.
When a complete ALTS frame doesn't fit into the buffer, directly expand the buffer to the required capacity instead of incrementing in steps of 4KB.
If the buffer passed to Read() is large enough, use it to store the decrypted message, avoiding an extra copy to buf. The default buffer size used by gRPC is 16KB, which also happens to be the size of ALTS records from GCS. This means we usually avoid the extra copy.

Benchmarks

branch: master

go test ./credentials/alts/internal/conn -timeout=2s -bench="Bench"
goos: linux
goarch: amd64
pkg: google.golang.org/grpc/credentials/alts/internal/conn
cpu: Intel(R) Xeon(R) CPU @ 2.60GHz
BenchmarkLargeMessage-48              13          90139719 ns/op
PASS
ok      google.golang.org/grpc/credentials/alts/internal/conn   2.316s

branch: perfromance (this PR)

goos: linux
goarch: amd64
pkg: google.golang.org/grpc/credentials/alts/internal/conn
cpu: Intel(R) Xeon(R) CPU @ 2.60GHz
BenchmarkLargeMessage-48              14          80422425 ns/op
PASS
ok      google.golang.org/grpc/credentials/alts/internal/conn   2.139s

RELEASE NOTES: N/A

codecov · 2025-03-28T03:50:57Z

Codecov Report

Attention: Patch coverage is 78.37838% with 8 lines in your changes missing coverage. Please review.

Project coverage is 81.98%. Comparing base (6819ed7) to head (3756c57).
Report is 8 commits behind head on master.

Files with missing lines	Patch %	Lines
credentials/alts/internal/conn/record.go	71.42%	6 Missing and 2 partials ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #8204      +/-   ##
==========================================
- Coverage   82.02%   81.98%   -0.04%     
==========================================
  Files         410      410              
  Lines       40233    40389     +156     
==========================================
+ Hits        33000    33113     +113     
- Misses       5865     5897      +32     
- Partials     1368     1379      +11

Files with missing lines	Coverage Δ
credentials/alts/internal/conn/common.go	`100.00% <100.00%> (ø)`
credentials/alts/internal/handshaker/handshaker.go	`69.58% <100.00%> (ø)`
credentials/alts/internal/conn/record.go	`77.48% <71.42%> (-5.07%)`	⬇️

... and 31 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

dfawley · 2025-03-28T15:36:19Z

I don't need to review - the grpc-security team's review (whoever does it) should be sufficient.

cc @gtcooke94

credentials/alts/internal/conn/record.go

rockspore · 2025-03-31T20:20:54Z

credentials/alts/internal/conn/common.go

+// returns a boolean that indicates if the buffer contains sufficient bytes to
+// parse the length header. If there are insufficient bytes, (0, false) is
+// returned.
+func ParseMessageLength(b []byte) (uint32, bool) {


This can be private.

True, unexported it.

rockspore · 2025-04-01T17:24:20Z

credentials/alts/internal/conn/record_test.go

+
+// BenchmarkLargeMessage measures the performance of ALTS conns for sending and
+// receiving a large message.
+func BenchmarkLargeMessage(b *testing.B) {


@gtcooke94 Do we have any GitHub actions that already run these benchmarks or should we add any if not?

I think someone else on the Go team would be the person to ask here - I don't know much about the CI setup. @arjan-bal do you know about the grpc-go github CI and benchmarking?

We don't run benchmarks as part of CI. We have benchmarks here that we ask PR authors to run when reviewing PRs that effect performance. We can have a similar benchmark for ALTS or modify the existing benchmark to support ALTS.

We have performance dashboard for all languages here, but we don't have alerts setup for regressions: https://grafana-dot-grpc-testing.appspot.com/?orgId=1

credentials/alts/internal/conn/record.go

gtcooke94

This looks good overall, just some specific questions I want to make sure we all understand

gtcooke94

I think this looks good, thanks! The question remaining is how the benchmark you added interacts with the CI - I'm good with this being a follow-up if it'll take a lot of extra time.

arjan-bal · 2025-04-03T09:18:26Z

The CI has caught a data race where gRPC is calling conn.Read and conn.Close concurrently: https://github.com/grpc/grpc-go/actions/runs/14238675443/job/39903307175?pr=8204

There are a few solutions to fix the race:

Don't add the protectedBuf back to the buffer pool. It will be garbage collected like before.
Add synchronization in the http client to serialize calls Read and Close.
Add synchronization inside alts.conn to serialize Read and Close.

I'll try to see if I can make the fix for 2. If that's not possible, 1 seems fine too.

arjan-bal · 2025-04-03T11:07:36Z

The CI has caught a data race where gRPC is calling conn.Read and conn.Close concurrently: https://github.com/grpc/grpc-go/actions/runs/14238675443/job/39903307175?pr=8204

There are a few solutions to fix the race:

Don't add the protectedBuf back to the buffer pool. It will be garbage collected like before.

Add synchronization in the http client to serialize calls Read and Close.

Add synchronization inside alts.conn to serialize Read, Write and Close.

I'll try to see if I can make the fix for 2. If that's not possible, 1 seems fine too.

The stack traces show that the writer goroutine in the http2 client closes the net.Conn while the reader goroutine is still reading. The error returned by loopy.run() is rpc error: code = Canceled desc = grpc: the client connection is closing.

grpc-go/internal/transport/http2_client.go

Lines 470 to 480 in 51d6a43

    
           go func() { 
        
           	t.loopy = newLoopyWriter(clientSide, t.framer, t.controlBuf, t.bdpEst, t.conn, t.logger, t.outgoingGoAwayHandler, t.bufferPool) 
        
           	if err := t.loopy.run(); !isIOError(err) { 
        
           		// Immediately close the connection, as the loopy writer returns 
        
           		// when there are no more active streams and we were draining (the 
        
           		// server sent a GOAWAY).  For I/O errors, the reader will hit it 
        
           		// after draining any remaining incoming data. 
        
           		t.conn.Close() 
        
           	} 
        
           	close(t.writerDone) 
        
           }()

Race detector trace

WARNING: DATA RACE
Read at 0x00c0001724f8 by goroutine 386:
  google.golang.org/grpc/credentials/alts/internal/conn.(*conn).Read()
      /home/runner/work/grpc-go/grpc-go/credentials/alts/internal/conn/record.go:157 +0x5e
  bufio.(*Reader).Read()
      /opt/hostedtoolcache/go/1.24.2/x[64](https://github.com/grpc/grpc-go/actions/runs/14238675443/job/39903307175?pr=8204#step:8:65)/src/bufio/bufio.go:245 +0x4b7
  io.ReadAtLeast()
      /opt/hostedtoolcache/go/1.24.2/x64/src/io/io.go:335 +0xca
  io.ReadFull()
      /opt/hostedtoolcache/go/1.24.2/x64/src/io/io.go:354 +0x99
  golang.org/x/net/http2.readFrameHeader()
      /home/runner/go/pkg/mod/golang.org/x/[email protected]/http2/frame.go:237 +0x1a
  golang.org/x/net/http2.(*Framer).ReadFrame()
      /home/runner/go/pkg/mod/golang.org/x/[email protected]/http2/frame.go:501 +0xec
  google.golang.org/grpc/internal/transport.(*http2Client).reader()
      /home/runner/work/grpc-go/grpc-go/internal/transport/http2_client.go:1639 +0x2db
  google.golang.org/grpc/internal/transport.NewHTTP2Client.gowrap4()
      /home/runner/work/grpc-go/grpc-go/internal/transport/http2_client.go:414 +0x44

 Previous write at 0x00c0001724f8 by goroutine 409:
  google.golang.org/grpc/credentials/alts/internal/conn.(*conn).Close()
      /home/runner/work/grpc-go/grpc-go/credentials/alts/internal/conn/record.go:145 +0x57
  google.golang.org/grpc/internal/transport.NewHTTP2Client.func6()
      /home/runner/work/grpc-go/grpc-go/internal/transport/http2_client.go:477 +0x2bd

arjan-bal · 2025-04-03T16:30:55Z

Talked to @dfawley and decided that we don't need to use the buffer pool here since there is only one buffer that is re-used for reads. The buffer is re-allocated when a larger ALTS record needs to be processed, but it should become stable after a while.

Doug raised a concern about the buffer only growing but not shrinking, which may lead to higher memory usage if the peers sent large ALTS records in the past and there are a large number of ALTS conns. I'm checking what other languages do.

gtcooke94 · 2025-04-04T19:54:48Z

Is this ready for another review pass, and are the benchmarks in the description up to date with the change? Or are you still experimenting with doing it without the buffer pools?

arjan-bal · 2025-04-07T06:08:24Z

Is this ready for another review pass, and are the benchmarks in the description up to date with the change? Or are you still experimenting with doing it without the buffer pools?

Decided to remove the buffer pool after discussion with Doug. I re-ran the benchmark and the performance seems to be very slightly better after removing the buffer pool. The PR description is updated.

Doug raised a concern about the buffer only growing but not shrinking, which may lead to higher memory usage if the peers sent large ALTS records in the past and there are a large number of ALTS conns. I'm checking what other languages do.

Decided not to solve the buffer shrinking problem since the current implementation already suffers from this and it's not something introduced in this PR.

From discussions with other language maintainers Java has a scatter gather to read multiple buffers from the socket. The list of buffers is being copied to a contiguous array for decryption, so it's not zero copy. I tried implementing a similar scatter gather style in Go and the performance was ~10% worse than the existing implementation on the master branch.

gtcooke94 · 2025-04-07T14:49:09Z

credentials/alts/internal/conn/record.go

+					panic(fmt.Sprintf("protected buffer length shorter than expected: %d vs %d", len(p.protected), MsgLenFieldSize))
+				}
+				oldProtectedBuf := p.protected
+				p.protected = make([]byte, int(length)+MsgLenFieldSize)


I didn't get to review this very last changeset - here on 169 we make a specific length int(length)+MsgLenFieldSize, copy to it, then slice to a different var length len(oldProtectedBuf)

I suspected something is not quite right here or can be made a little clearer?

The new buffer that we allocate must be able to hold the entire encrypted record. After reading the message header, we know the length of the record is: length parsed from the message header + size of the message length header. This is the capacity, but the length of the new buffer should be set to the number of bytes that are already read. So we set the length to length of the existing buffer and copy its contents. Let me raise a PR with some commentry.

Created a PR to add clarity, PTAL: #8232

This reverts commit b368379.

* Revert "credentials/alts: Add comments to clarify buffer sizing (#8232)" This reverts commit be25d96. * Revert "credentials/alts: Optimize reads (#8204)" This reverts commit b368379.

* Revert "credentials/alts: Add comments to clarify buffer sizing (grpc#8232)" This reverts commit be25d96. * Revert "credentials/alts: Optimize reads (grpc#8204)" This reverts commit b368379.

Optimize ALTS reads

f8e6f94

arjan-bal added Type: Performance Performance improvements (CPU, network, memory, etc) Area: Auth Includes regular credentials API and implementation. Also includes advancedtls, authz, rbac etc. labels Mar 28, 2025

arjan-bal changed the title ~~Optimize ALTS reads~~ credentials/alts: Optimize reads Mar 28, 2025

arjan-bal added this to the 1.72 Release milestone Mar 28, 2025

Add test coverage

7a3308b

arjan-bal requested review from matthewstevenson88 and dfawley March 28, 2025 04:29

arjan-bal assigned dfawley and matthewstevenson88 Mar 28, 2025

dfawley removed their assignment Mar 28, 2025

matthewstevenson88 requested review from gtcooke94 and rockspore and removed request for matthewstevenson88 March 31, 2025 19:54

gtcooke94 reviewed Mar 31, 2025

View reviewed changes

credentials/alts/internal/conn/record.go Outdated Show resolved Hide resolved

rockspore reviewed Apr 1, 2025

View reviewed changes