Skip to content

PooledConnectionLifetime causes periodic socket recreation #2647

@SonicGD

Description

@SonicGD

What version of gRPC and what language are you using?

2.64.0, 2.71.0

What operating system (Linux, Windows,...) and version?

Windows 11 24H2, Debian Linux 12.10

What runtime / compiler are you using (e.g. .NET Core SDK version dotnet --info)

.NET SDK:
Version: 9.0.302
Commit: bb2550b9af
Workload version: 9.0.300-manifests.57e2cde6
MSBuild version: 17.14.13+65391c53b

Runtime Environment:
OS Name: Windows
OS Version: 10.0.26100
OS Platform: Windows
RID: win-x64
Base Path: C:\Program Files\dotnet\sdk\9.0.302\

What did you do?

I was analysing app logs and found some suspicious traces:

[2025-07-16 19:41:40] trce: Grpc.Net.Client.Balancer.Internal.SocketConnectivitySubchannelTransport[4]
      Subchannel id '2-1' checking socket localhost:7289.
[2025-07-16 19:41:40] dbug: Grpc.Net.Client.Balancer.Internal.SocketConnectivitySubchannelTransport[14]
      Subchannel id '2-1' socket localhost:7289 is in a bad state and can't be used.
[2025-07-16 19:41:40] dbug: Grpc.Net.Client.Balancer.Internal.SocketConnectivitySubchannelTransport[16]
      Subchannel id '2-1' socket localhost:7289 is being closed because it can't be used. Socket lifetime of 00:00:15.0227203. The socket either can't receive data or it has received unexpected data.
[2025-07-16 19:41:40] trce: Grpc.Net.Client.Balancer.Internal.SocketConnectivitySubchannelTransport[1]
      Subchannel id '2-1' connecting socket to localhost:7289.
[2025-07-16 19:41:40] dbug: Grpc.Net.Client.Balancer.Internal.SocketConnectivitySubchannelTransport[2]
      Subchannel id '2-1' connected to socket localhost:7289.

It seems it doesn't affect actual gRPC requests. But i decided to investigate what causes this. After some digging i found that PooledConnectionLifetime setting on SocketsHttpHandler is somehow involved. The shorter its value, the faster errors begin to appear. With Timespan.Infinity errors will never come. Socket recreation starts to happen after second PooledConnectionLifetime interval. So, basically, if PooledConnectionLifetime is 2 minutes, than after 2 minutes connection will be recreated, after another 2 minutes it will be recreated again and after that we will see logs about socket in bad state every 10 or 15 seconds.

I've build simple example - https://github.com/SonicGD/GrpcSDErrors. Just run it and wait for about 4 minutes before you see those traces in logs. Some notes:

  1. PooledConnectionLifetime is set to 2 minutes just like it will be set here if we use default handler. Set it to 15 seconds and reconnections will start happens earlier.
  2. You can remove PooledConnectionLifetime or set it to infinity and errors will go away
  3. You can remove ConfigurePrimaryHttpMessageHandler block and use default handler and still see errors
  4. You can remove LoadBalancingConfigs and it will also "fix" errors

I see in SocketConnectivitySubchannelTransport code these logs are from OnCheckSocketConnection. If i understand code correctly it only checks _initialSocket and _initialSocket is not null only if no streams was created after latest TryConnectAsync call. However it seem that PooledConnectionLifetime leads to have simultaneously unused _initialSocket and active connections. After first error we will see constant Transport status: InitialSocket in logs.

As i said it doesn't affect actual application requests, but i don't think it's good to create new connection to server every 10-15 seconds.

What did you expect to see?

No constant reconnections to server

What did you see instead?

Constant reconnections to server

Anything else we should know about your project / environment?

I tried on several machines, in docker, with different Resolvers with the same result

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions