Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

credentials/alts: Optimize reads #8204

Merged
merged 7 commits into from
Apr 7, 2025
Merged

Conversation

arjan-bal
Copy link
Contributor

@arjan-bal arjan-bal commented Mar 28, 2025

Background

Internal issue: b/400312873

While benchmarking Google Cloud Storage read performance, the flame graphs showed significant time being spent in google.golang.org/grpc/credentials/alts/internal/conn.(*conn).Read. These changes are done to reduce the time spent in this function. The following optimizations are done:

  1. In the handshaker, allocate a buffer only once, outside the for loop.
  2. Use 32KB (previously 4KB) buffers to read from the network.
  3. When a complete ALTS frame doesn't fit into the buffer, directly expand the buffer to the required capacity instead of incrementing in steps of 4KB.
  4. If the buffer passed to Read() is large enough, use it to store the decrypted message, avoiding an extra copy to buf. The default buffer size used by gRPC is 16KB, which also happens to be the size of ALTS records from GCS. This means we usually avoid the extra copy.

Benchmarks

branch: master

go test ./credentials/alts/internal/conn -timeout=2s -bench="Bench"
goos: linux
goarch: amd64
pkg: google.golang.org/grpc/credentials/alts/internal/conn
cpu: Intel(R) Xeon(R) CPU @ 2.60GHz
BenchmarkLargeMessage-48              13          90139719 ns/op
PASS
ok      google.golang.org/grpc/credentials/alts/internal/conn   2.316s

branch: perfromance (this PR)

goos: linux
goarch: amd64
pkg: google.golang.org/grpc/credentials/alts/internal/conn
cpu: Intel(R) Xeon(R) CPU @ 2.60GHz
BenchmarkLargeMessage-48              14          80422425 ns/op
PASS
ok      google.golang.org/grpc/credentials/alts/internal/conn   2.139s

RELEASE NOTES:

  • credentials/alts: Improve read performance by optimizing buffer copies and allocations.

@arjan-bal arjan-bal added Type: Performance Performance improvements (CPU, network, memory, etc) Area: Auth Includes regular credentials API and implementation. Also includes advancedtls, authz, rbac etc. labels Mar 28, 2025
@arjan-bal arjan-bal changed the title Optimize ALTS reads credentials/alts: Optimize reads Mar 28, 2025
@arjan-bal arjan-bal added this to the 1.72 Release milestone Mar 28, 2025
Copy link

codecov bot commented Mar 28, 2025

Codecov Report

Attention: Patch coverage is 78.37838% with 8 lines in your changes missing coverage. Please review.

Project coverage is 81.98%. Comparing base (6819ed7) to head (3756c57).
Report is 8 commits behind head on master.

Files with missing lines Patch % Lines
credentials/alts/internal/conn/record.go 71.42% 6 Missing and 2 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #8204      +/-   ##
==========================================
- Coverage   82.02%   81.98%   -0.04%     
==========================================
  Files         410      410              
  Lines       40233    40389     +156     
==========================================
+ Hits        33000    33113     +113     
- Misses       5865     5897      +32     
- Partials     1368     1379      +11     
Files with missing lines Coverage Δ
credentials/alts/internal/conn/common.go 100.00% <100.00%> (ø)
credentials/alts/internal/handshaker/handshaker.go 69.58% <100.00%> (ø)
credentials/alts/internal/conn/record.go 77.48% <71.42%> (-5.07%) ⬇️

... and 31 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@dfawley
Copy link
Member

dfawley commented Mar 28, 2025

I don't need to review - the grpc-security team's review (whoever does it) should be sufficient.

cc @gtcooke94

@matthewstevenson88 matthewstevenson88 requested review from gtcooke94 and rockspore and removed request for matthewstevenson88 March 31, 2025 19:54
// returns a boolean that indicates if the buffer contains sufficient bytes to
// parse the length header. If there are insufficient bytes, (0, false) is
// returned.
func ParseMessageLength(b []byte) (uint32, bool) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be private.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, unexported it.


// BenchmarkLargeMessage measures the performance of ALTS conns for sending and
// receiving a large message.
func BenchmarkLargeMessage(b *testing.B) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gtcooke94 Do we have any GitHub actions that already run these benchmarks or should we add any if not?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think someone else on the Go team would be the person to ask here - I don't know much about the CI setup. @arjan-bal do you know about the grpc-go github CI and benchmarking?

Copy link
Contributor Author

@arjan-bal arjan-bal Apr 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't run benchmarks as part of CI. We have benchmarks here that we ask PR authors to run when reviewing PRs that effect performance. We can have a similar benchmark for ALTS or modify the existing benchmark to support ALTS.

We have performance dashboard for all languages here, but we don't have alerts setup for regressions: https://grafana-dot-grpc-testing.appspot.com/?orgId=1

Copy link
Contributor

@gtcooke94 gtcooke94 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good overall, just some specific questions I want to make sure we all understand

@arjan-bal arjan-bal assigned rockspore and gtcooke94 and unassigned arjan-bal Apr 2, 2025
Copy link
Contributor

@gtcooke94 gtcooke94 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this looks good, thanks! The question remaining is how the benchmark you added interacts with the CI - I'm good with this being a follow-up if it'll take a lot of extra time.

@arjan-bal arjan-bal assigned arjan-bal and unassigned rockspore and gtcooke94 Apr 3, 2025
@arjan-bal
Copy link
Contributor Author

arjan-bal commented Apr 3, 2025

The CI has caught a data race where gRPC is calling conn.Read and conn.Close concurrently: https://github.com/grpc/grpc-go/actions/runs/14238675443/job/39903307175?pr=8204

There are a few solutions to fix the race:

  1. Don't add the protectedBuf back to the buffer pool. It will be garbage collected like before.
  2. Add synchronization in the http client to serialize calls Read and Close.
  3. Add synchronization inside alts.conn to serialize Read and Close.

I'll try to see if I can make the fix for 2. If that's not possible, 1 seems fine too.

@arjan-bal
Copy link
Contributor Author

The CI has caught a data race where gRPC is calling conn.Read and conn.Close concurrently: https://github.com/grpc/grpc-go/actions/runs/14238675443/job/39903307175?pr=8204

There are a few solutions to fix the race:

  1. Don't add the protectedBuf back to the buffer pool. It will be garbage collected like before.
  2. Add synchronization in the http client to serialize calls Read and Close.
  3. Add synchronization inside alts.conn to serialize Read, Write and Close.

I'll try to see if I can make the fix for 2. If that's not possible, 1 seems fine too.

The stack traces show that the writer goroutine in the http2 client closes the net.Conn while the reader goroutine is still reading. The error returned by loopy.run() is rpc error: code = Canceled desc = grpc: the client connection is closing.

go func() {
t.loopy = newLoopyWriter(clientSide, t.framer, t.controlBuf, t.bdpEst, t.conn, t.logger, t.outgoingGoAwayHandler, t.bufferPool)
if err := t.loopy.run(); !isIOError(err) {
// Immediately close the connection, as the loopy writer returns
// when there are no more active streams and we were draining (the
// server sent a GOAWAY). For I/O errors, the reader will hit it
// after draining any remaining incoming data.
t.conn.Close()
}
close(t.writerDone)
}()

Race detector trace
WARNING: DATA RACE
Read at 0x00c0001724f8 by goroutine 386:
  google.golang.org/grpc/credentials/alts/internal/conn.(*conn).Read()
      /home/runner/work/grpc-go/grpc-go/credentials/alts/internal/conn/record.go:157 +0x5e
  bufio.(*Reader).Read()
      /opt/hostedtoolcache/go/1.24.2/x[64](https://github.com/grpc/grpc-go/actions/runs/14238675443/job/39903307175?pr=8204#step:8:65)/src/bufio/bufio.go:245 +0x4b7
  io.ReadAtLeast()
      /opt/hostedtoolcache/go/1.24.2/x64/src/io/io.go:335 +0xca
  io.ReadFull()
      /opt/hostedtoolcache/go/1.24.2/x64/src/io/io.go:354 +0x99
  golang.org/x/net/http2.readFrameHeader()
      /home/runner/go/pkg/mod/golang.org/x/net@v0.35.0/http2/frame.go:237 +0x1a
  golang.org/x/net/http2.(*Framer).ReadFrame()
      /home/runner/go/pkg/mod/golang.org/x/net@v0.35.0/http2/frame.go:501 +0xec
  google.golang.org/grpc/internal/transport.(*http2Client).reader()
      /home/runner/work/grpc-go/grpc-go/internal/transport/http2_client.go:1639 +0x2db
  google.golang.org/grpc/internal/transport.NewHTTP2Client.gowrap4()
      /home/runner/work/grpc-go/grpc-go/internal/transport/http2_client.go:414 +0x44

 Previous write at 0x00c0001724f8 by goroutine 409:
  google.golang.org/grpc/credentials/alts/internal/conn.(*conn).Close()
      /home/runner/work/grpc-go/grpc-go/credentials/alts/internal/conn/record.go:145 +0x57
  google.golang.org/grpc/internal/transport.NewHTTP2Client.func6()
      /home/runner/work/grpc-go/grpc-go/internal/transport/http2_client.go:477 +0x2bd

@arjan-bal
Copy link
Contributor Author

arjan-bal commented Apr 3, 2025

Talked to @dfawley and decided that we don't need to use the buffer pool here since there is only one buffer that is re-used for reads. The buffer is re-allocated when a larger ALTS record needs to be processed, but it should become stable after a while.

Doug raised a concern about the buffer only growing but not shrinking, which may lead to higher memory usage if the peers sent large ALTS records in the past and there are a large number of ALTS conns. I'm checking what other languages do.

@gtcooke94
Copy link
Contributor

Is this ready for another review pass, and are the benchmarks in the description up to date with the change? Or are you still experimenting with doing it without the buffer pools?

@arjan-bal
Copy link
Contributor Author

Is this ready for another review pass, and are the benchmarks in the description up to date with the change? Or are you still experimenting with doing it without the buffer pools?

Decided to remove the buffer pool after discussion with Doug. I re-ran the benchmark and the performance seems to be very slightly better after removing the buffer pool. The PR description is updated.

Doug raised a concern about the buffer only growing but not shrinking, which may lead to higher memory usage if the peers sent large ALTS records in the past and there are a large number of ALTS conns. I'm checking what other languages do.

Decided not to solve the buffer shrinking problem since the current implementation already suffers from this and it's not something introduced in this PR.

From discussions with other language maintainers Java has a scatter gather to read multiple buffers from the socket. The list of buffers is being copied to a contiguous array for decryption, so it's not zero copy. I tried implementing a similar scatter gather style in Go and the performance was ~10% worse than the existing implementation on the master branch.

@arjan-bal arjan-bal merged commit b368379 into grpc:master Apr 7, 2025
15 checks passed
@arjan-bal arjan-bal deleted the performance branch April 7, 2025 06:21
panic(fmt.Sprintf("protected buffer length shorter than expected: %d vs %d", len(p.protected), MsgLenFieldSize))
}
oldProtectedBuf := p.protected
p.protected = make([]byte, int(length)+MsgLenFieldSize)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't get to review this very last changeset - here on 169 we make a specific length int(length)+MsgLenFieldSize, copy to it, then slice to a different var length len(oldProtectedBuf)

I suspected something is not quite right here or can be made a little clearer?

Copy link
Contributor Author

@arjan-bal arjan-bal Apr 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new buffer that we allocate must be able to hold the entire encrypted record. After reading the message header, we know the length of the record is: length parsed from the message header + size of the message length header. This is the capacity, but the length of the new buffer should be set to the number of bytes that are already read. So we set the length to length of the existing buffer and copy its contents. Let me raise a PR with some commentry.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Created a PR to add clarity, PTAL: #8232

arjan-bal added a commit to arjan-bal/grpc-go that referenced this pull request Apr 8, 2025
arjan-bal added a commit that referenced this pull request Apr 8, 2025
* Revert "credentials/alts: Add comments to clarify buffer sizing (#8232)"

This reverts commit be25d96.

* Revert "credentials/alts: Optimize reads (#8204)"

This reverts commit b368379.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area: Auth Includes regular credentials API and implementation. Also includes advancedtls, authz, rbac etc. Type: Performance Performance improvements (CPU, network, memory, etc)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants