-
Notifications
You must be signed in to change notification settings - Fork 807
Description
What version of gRPC and what language are you using?
2.64.0, 2.71.0
What operating system (Linux, Windows,...) and version?
Windows 11 24H2, Debian Linux 12.10
What runtime / compiler are you using (e.g. .NET Core SDK version dotnet --info
)
.NET SDK:
Version: 9.0.302
Commit: bb2550b9af
Workload version: 9.0.300-manifests.57e2cde6
MSBuild version: 17.14.13+65391c53b
Runtime Environment:
OS Name: Windows
OS Version: 10.0.26100
OS Platform: Windows
RID: win-x64
Base Path: C:\Program Files\dotnet\sdk\9.0.302\
What did you do?
I was analysing app logs and found some suspicious traces:
[2025-07-16 19:41:40] trce: Grpc.Net.Client.Balancer.Internal.SocketConnectivitySubchannelTransport[4]
Subchannel id '2-1' checking socket localhost:7289.
[2025-07-16 19:41:40] dbug: Grpc.Net.Client.Balancer.Internal.SocketConnectivitySubchannelTransport[14]
Subchannel id '2-1' socket localhost:7289 is in a bad state and can't be used.
[2025-07-16 19:41:40] dbug: Grpc.Net.Client.Balancer.Internal.SocketConnectivitySubchannelTransport[16]
Subchannel id '2-1' socket localhost:7289 is being closed because it can't be used. Socket lifetime of 00:00:15.0227203. The socket either can't receive data or it has received unexpected data.
[2025-07-16 19:41:40] trce: Grpc.Net.Client.Balancer.Internal.SocketConnectivitySubchannelTransport[1]
Subchannel id '2-1' connecting socket to localhost:7289.
[2025-07-16 19:41:40] dbug: Grpc.Net.Client.Balancer.Internal.SocketConnectivitySubchannelTransport[2]
Subchannel id '2-1' connected to socket localhost:7289.
It seems it doesn't affect actual gRPC requests. But i decided to investigate what causes this. After some digging i found that PooledConnectionLifetime
setting on SocketsHttpHandler
is somehow involved. The shorter its value, the faster errors begin to appear. With Timespan.Infinity
errors will never come. Socket recreation starts to happen after second PooledConnectionLifetime
interval. So, basically, if PooledConnectionLifetime
is 2 minutes, than after 2 minutes connection will be recreated, after another 2 minutes it will be recreated again and after that we will see logs about socket in bad state every 10 or 15 seconds.
I've build simple example - https://github.com/SonicGD/GrpcSDErrors. Just run it and wait for about 4 minutes before you see those traces in logs. Some notes:
PooledConnectionLifetime
is set to 2 minutes just like it will be set here if we use default handler. Set it to 15 seconds and reconnections will start happens earlier.- You can remove
PooledConnectionLifetime
or set it to infinity and errors will go away - You can remove
ConfigurePrimaryHttpMessageHandler
block and use default handler and still see errors - You can remove
LoadBalancingConfigs
and it will also "fix" errors
I see in SocketConnectivitySubchannelTransport
code these logs are from OnCheckSocketConnection. If i understand code correctly it only checks _initialSocket
and _initialSocket
is not null only if no streams was created after latest TryConnectAsync
call. However it seem that PooledConnectionLifetime
leads to have simultaneously unused _initialSocket
and active connections. After first error we will see constant Transport status: InitialSocket
in logs.
As i said it doesn't affect actual application requests, but i don't think it's good to create new connection to server every 10-15 seconds.
What did you expect to see?
No constant reconnections to server
What did you see instead?
Constant reconnections to server
Anything else we should know about your project / environment?
I tried on several machines, in docker, with different Resolvers with the same result