-
Notifications
You must be signed in to change notification settings - Fork 4.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
credentials/alts: Optimize reads #8204
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #8204 +/- ##
==========================================
- Coverage 82.02% 81.98% -0.04%
==========================================
Files 410 410
Lines 40233 40389 +156
==========================================
+ Hits 33000 33113 +113
- Misses 5865 5897 +32
- Partials 1368 1379 +11
🚀 New features to boost your workflow:
|
I don't need to review - the grpc-security team's review (whoever does it) should be sufficient. cc @gtcooke94 |
// returns a boolean that indicates if the buffer contains sufficient bytes to | ||
// parse the length header. If there are insufficient bytes, (0, false) is | ||
// returned. | ||
func ParseMessageLength(b []byte) (uint32, bool) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be private.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True, unexported it.
|
||
// BenchmarkLargeMessage measures the performance of ALTS conns for sending and | ||
// receiving a large message. | ||
func BenchmarkLargeMessage(b *testing.B) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gtcooke94 Do we have any GitHub actions that already run these benchmarks or should we add any if not?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think someone else on the Go team would be the person to ask here - I don't know much about the CI setup. @arjan-bal do you know about the grpc-go github CI and benchmarking?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't run benchmarks as part of CI. We have benchmarks here that we ask PR authors to run when reviewing PRs that effect performance. We can have a similar benchmark for ALTS or modify the existing benchmark to support ALTS.
We have performance dashboard for all languages here, but we don't have alerts setup for regressions: https://grafana-dot-grpc-testing.appspot.com/?orgId=1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good overall, just some specific questions I want to make sure we all understand
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this looks good, thanks! The question remaining is how the benchmark you added interacts with the CI - I'm good with this being a follow-up if it'll take a lot of extra time.
The CI has caught a data race where gRPC is calling There are a few solutions to fix the race:
I'll try to see if I can make the fix for 2. If that's not possible, 1 seems fine too. |
The stack traces show that the writer goroutine in the http2 client closes the grpc-go/internal/transport/http2_client.go Lines 470 to 480 in 51d6a43
Race detector trace
|
Talked to @dfawley and decided that we don't need to use the buffer pool here since there is only one buffer that is re-used for reads. The buffer is re-allocated when a larger ALTS record needs to be processed, but it should become stable after a while. Doug raised a concern about the buffer only growing but not shrinking, which may lead to higher memory usage if the peers sent large ALTS records in the past and there are a large number of ALTS conns. I'm checking what other languages do. |
Is this ready for another review pass, and are the benchmarks in the description up to date with the change? Or are you still experimenting with doing it without the buffer pools? |
Decided to remove the buffer pool after discussion with Doug. I re-ran the benchmark and the performance seems to be very slightly better after removing the buffer pool. The PR description is updated.
Decided not to solve the buffer shrinking problem since the current implementation already suffers from this and it's not something introduced in this PR. From discussions with other language maintainers Java has a scatter gather to read multiple buffers from the socket. The list of buffers is being copied to a contiguous array for decryption, so it's not zero copy. I tried implementing a similar scatter gather style in Go and the performance was ~10% worse than the existing implementation on the master branch. |
panic(fmt.Sprintf("protected buffer length shorter than expected: %d vs %d", len(p.protected), MsgLenFieldSize)) | ||
} | ||
oldProtectedBuf := p.protected | ||
p.protected = make([]byte, int(length)+MsgLenFieldSize) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't get to review this very last changeset - here on 169 we make a specific length int(length)+MsgLenFieldSize
, copy to it, then slice to a different var length len(oldProtectedBuf)
I suspected something is not quite right here or can be made a little clearer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new buffer that we allocate must be able to hold the entire encrypted record. After reading the message header, we know the length of the record is: length parsed from the message header
+ size of the message length header
. This is the capacity, but the length of the new buffer should be set to the number of bytes that are already read. So we set the length to length of the existing buffer and copy its contents. Let me raise a PR with some commentry.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Created a PR to add clarity, PTAL: #8232
This reverts commit b368379.
Background
Internal issue: b/400312873
While benchmarking Google Cloud Storage read performance, the flame graphs showed significant time being spent in
google.golang.org/grpc/credentials/alts/internal/conn.(*conn).Read
. These changes are done to reduce the time spent in this function. The following optimizations are done:for
loop.Read()
is large enough, use it to store the decrypted message, avoiding an extra copy tobuf
. The default buffer size used by gRPC is 16KB, which also happens to be the size of ALTS records from GCS. This means we usually avoid the extra copy.Benchmarks
branch: master
branch: perfromance (this PR)
RELEASE NOTES: