Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate/document noisy gRPC logs #9165

Closed
gyuho opened this issue Jan 18, 2018 · 22 comments
Closed

Investigate/document noisy gRPC logs #9165

gyuho opened this issue Jan 18, 2018 · 22 comments

Comments

@gyuho
Copy link
Contributor

gyuho commented Jan 18, 2018

Quite a few people have asked about this logs (related etcd-io/etcdlabs#281 and #8822).

2017-11-10 18:02:24.981969 I | etcdserver/api/v3rpc: transport: http2Server.HandleStreams failed to read frame: read tcp 127.0.0.1:2391->127.0.0.1:43020: read: connection reset by peer
2017-11-10 18:02:24.982017 I | etcdserver/api/v3rpc: transport: http2Server.HandleStreams failed to read frame: read tcp 127.0.0.1:2397->127.0.0.1:37926: read: connection reset by peer
2017-11-10 18:02:26.985608 I | etcdserver/api/v3rpc: transport: http2Server.HandleStreams failed to read frame: read tcp 127.0.0.1:2391->127.0.0.1:43088: read: connection reset by peer

In most cases, this is logging from gRPC-side. Logging format is overriden with capnslog in old versions of etcd, and confusing to a lot of people.

Check if it still happens (e.g. 2 member and another member joining). And clearly document what it is.

@xiang90
Copy link
Contributor

xiang90 commented Jan 18, 2018

@SaranBalaji90
Copy link

SaranBalaji90 commented Jan 18, 2018

We did some investigations around this and this is what I have so far. (I found grpc-go github issue that moves these logs into different level in their latest version but I'm not sure why are we seeing multiple connections being initiated/terminated)

Issue
Etcd logs have quite a few lines that says "http2Server.HandleStreams failed to read frame."

When does this happen?
When we initiate client connection through https://github.com/coreos/etcd/tree/master/clientv3. This doesn't happen all the time. Happens very rarely on cluster with size 1 or 2 (atleast in my testing)

Environment:
etcd Version: 3.0.17
OS/Arch: linux/amd64
initial cluster size: 3
ClientV3 built on the latest code (grpc versions between clientv3 and etcd server are different I think.)

Log content:
api: enabled capabilities for version 2.3
2018-01-18 21:08:42.291730 I | v3rpc/grpc: transport: http2Server.HandleStreams failed to read frame: read tcp 127.0.0.1:2379->127.0.0.1:57382: read: connection reset by peer
2018-01-18 21:08:42.300708 I | v3rpc/grpc: transport: http2Server.HandleStreams failed to read frame: read tcp 127.0.0.1:2379->127.0.0.1:57406: read: connection reset by peer
2018-01-18 21:08:42.338178 I | v3rpc/grpc: transport: http2Server.HandleStreams failed to read frame: read tcp 127.0.0.1:2379->127.0.0.1:57508: read: connection reset by peer
2018-01-18 21:08:42.350635 I | v3rpc/grpc: transport: http2Server.HandleStreams failed to read frame: read tcp 127.0.0.1:2379->127.0.0.1:57546: read: connection reset by peer
2018-01-18 21:08:42.352319 I | v3rpc/grpc: transport: http2Server.HandleStreams failed to read frame: read tcp 127.0.0.1:2379->127.0.0.1:57552: read: connection reset by peer
2018-01-18 21:08:42.363271 I | v3rpc/grpc: transport: http2Server.HandleStreams failed to read frame: read tcp 127.0.0.1:2379->127.0.0.1:57590: read: connection reset by peer
2018-01-18 21:08:42.363966 I | v3rpc/grpc: transport: http2Server.HandleStreams failed to read frame: read tcp 127.0.0.1:2379->127.0.0.1:57592: read: connection reset by peer
2018-01-18 21:08:42.364715 I | v3rpc/grpc: transport: http2Server.HandleStreams failed to read frame: read tcp 127.0.0.1:2379->127.0.0.1:57594: read: connection reset by peer
2018-01-18 21:08:42.376637 I | v3rpc/grpc: transport: http2Server.HandleStreams failed to read frame: read tcp 127.0.0.1:2379->127.0.0.1:57630: read: connection reset by peer
2018-01-18 21:08:42.377825 I | v3rpc/grpc: transport: http2Server.HandleStreams failed to read frame: read tcp 127.0.0.1:2379->127.0.0.1:57634: read: connection reset by peer
2018-01-18 21:08:42.386868 I | v3rpc/grpc: transport: http2Server.HandleStreams failed to read frame: read tcp 127.0.0.1:2379->127.0.0.1:57666: read: connection reset by peer
2018-01-18 21:08:42.435807 I | v3rpc/grpc: transport: http2Server.HandleStreams failed to read frame: read tcp 127.0.0.1:2379->127.0.0.1:57806: read: connection reset by peer
2018-01-18 21:08:42.446440 I | v3rpc/grpc: transport: http2Server.HandleStreams failed to read frame: read tcp 127.0.0.1:2379->127.0.0.1:57840: read: connection reset by peer
2018-01-18 21:08:42.447943 I | v3rpc/grpc: transport: http2Server.HandleStreams failed to read frame: read tcp 127.0.0.1:2379->127.0.0.1:57844: read: connection reset by peer
2018-01-18 21:08:42.460177 I | v3rpc/grpc: transport: http2Server.HandleStreams failed to read frame: read tcp 127.0.0.1:2379->127.0.0.1:57886: read: connection reset by peer
2018-01-18 21:08:42.461543 I | v3rpc/grpc: transport: http2Server.HandleStreams failed to read frame: read tcp 127.0.0.1:2379->127.0.0.1:57890: read: connection reset by peer
2018-01-18 21:08:42.499513 I | v3rpc/grpc: transport: http2Server.HandleStreams failed to read frame: read tcp 127.0.0.1:2379->127.0.0.1:58006: read: connection reset by peer
2018-01-18 21:08:42.504792 I | v3rpc/grpc: transport: http2Server.HandleStreams failed to read frame: read tcp 127.0.0.1:2379->127.0.0.1:58022: read: connection reset by peer
2018-01-18 21:08:42.515685 I | v3rpc/grpc: transport: http2Server.HandleStreams failed to read frame: read tcp 127.0.0.1:2379->127.0.0.1:58052: read: connection reset by peer
2018-01-18 21:08:42.549694 I | v3rpc/grpc: transport: http2Server.HandleStreams failed to read frame: read tcp 127.0.0.1:2379->127.0.0.1:58146: read: connection reset by peer
2018-01-18 21:08:42.635488 I | v3rpc/grpc: transport: http2Server.HandleStreams failed to read frame: read tcp 127.0.0.1:2379->127.0.0.1:58414: read: connection reset by peer
2018-01-18 21:08:42.677853 I | v3rpc/grpc: transport: http2Server.HandleStreams failed to read frame: read tcp 127.0.0.1:2379->127.0.0.1:58538: read: connection reset by peer
2018-01-18 21:08:42.720503 I | v3rpc/grpc: transport: http2Server.HandleStreams failed to read frame: read tcp 127.0.0.1:2379->127.0.0.1:58670: read: connection reset by peer
2018-01-18 21:08:42.730826 I | v3rpc/grpc: transport: http2Server.HandleStreams failed to read frame: read tcp 127.0.0.1:2379->127.0.0.1:58702: read: connection reset by peer
2018-01-18 21:08:42.742906 I | v3rpc/grpc: transport: http2Server.HandleStreams failed to read frame: read tcp 127.0.0.1:2379->127.0.0.1:58744: read: connection reset by peer
2018-01-18 21:08:42.745616 I | v3rpc/grpc: transport: http2Server.HandleStreams failed to read frame: read tcp 127.0.0.1:2379->127.0.0.1:58752: read: connection reset by peer
2018-01-18 21:08:42.765168 I | v3rpc/grpc: transport: http2Server.HandleStreams failed to read frame: read tcp 127.0.0.1:2379->127.0.0.1:58818: read: connection reset by peer
2018-01-18 21:08:42.767629 I | v3rpc/grpc: transport: http2Server.HandleStreams failed to read frame: read tcp 127.0.0.1:2379->127.0.0.1:58826: read: connection reset by peer
2018-01-18 21:08:42.789137 I | v3rpc/grpc: transport: http2Server.HandleStreams failed to read frame: read tcp 127.0.0.1:2379->127.0.0.1:58874: read: connection reset by peer
2018-01-18 21:08:42.802547 I | v3rpc/grpc: transport: http2Server.HandleStreams failed to read frame: read tcp 127.0.0.1:2379->127.0.0.1:58912: read: connection reset by peer
2018-01-18 21:08:42.804985 I | v3rpc/grpc: transport: http2Server.HandleStreams failed to read frame: read tcp 127.0.0.1:2379->127.0.0.1:58920: read: connection reset by peer
2018-01-18 21:08:42.807661 I | v3rpc/grpc: transport: http2Server.HandleStreams failed to read frame: read tcp 127.0.0.1:2379->127.0.0.1:58930: read: connection reset by peer
2018-01-18 21:08:42.810472 I | v3rpc/grpc: transport: http2Server.HandleStreams failed to read frame: read tcp 127.0.0.1:2379->127.0.0.1:58938: read: connection reset by peer
2018-01-18 21:08:42.825231 I | v3rpc/grpc: transport: http2Server.HandleStreams failed to read frame: read tcp 127.0.0.1:2379->127.0.0.1:58982: read: connection reset by peer
2018-01-18 21:08:42.960636 I | v3rpc/grpc: transport: http2Server.HandleStreams failed to read frame: read tcp 127.0.0.1:2379->127.0.0.1:59398: read: connection reset by peer
2018-01-18 21:08:42.966399 I | v3rpc/grpc: transport: http2Server.HandleStreams failed to read frame: read tcp 127.0.0.1:2379->127.0.0.1:59416: read: connection reset by peer
2018-01-18 21:08:42.976604 I | v3rpc/grpc: transport: http2Server.HandleStreams failed to read frame: read tcp 127.0.0.1:2379->127.0.0.1:59444: read: connection reset by peer
2018-01-18 21:08:42.983215 I | v3rpc/grpc: transport: http2Server.HandleStreams failed to read frame: read tcp 127.0.0.1:2379->127.0.0.1:59468: read: connection reset by peer
2018-01-18 21:08:43.017511 I | v3rpc/grpc: transport: http2Server.HandleStreams failed to read frame: read tcp 127.0.0.1:2379->127.0.0.1:59574: read: connection reset by peer
2018-01-18 21:08:43.044907 I | v3rpc/grpc: transport: http2Server.HandleStreams failed to read frame: read tcp 127.0.0.1:2379->127.0.0.1:59646: read: connection reset by peer
2018-01-18 21:08:43.047128 I | v3rpc/grpc: transport: http2Server.HandleStreams failed to read frame: read tcp 127.0.0.1:2379->127.0.0.1:59652: read: connection reset by peer
2018-01-18 21:08:43.077899 I | v3rpc/grpc: transport: http2Server.HandleStreams failed to read frame: read tcp 127.0.0.1:2379->127.0.0.1:59758: read: connection reset by peer
2018-01-18 21:08:43.092882 I | v3rpc/grpc: transport: http2Server.HandleStreams failed to read frame: read tcp 127.0.0.1:2379->127.0.0.1:59806: read: connection reset by peer
2018-01-18 21:08:43.094104 I | v3rpc/grpc: transport: http2Server.HandleStreams failed to read frame: read tcp 127.0.0.1:2379->127.0.0.1:59810: read: connection reset by peer
2018-01-18 21:08:43.194593 I | v3rpc/grpc: transport: http2Server.HandleStreams failed to read frame: read tcp 127.0.0.1:2379->127.0.0.1:60128: read: connection reset by peer
2018-01-18 21:08:43.213599 I | v3rpc/grpc: transport: http2Server.HandleStreams failed to read frame: read tcp 127.0.0.1:2379->127.0.0.1:60188: read: connection reset by peer
2018-01-18 21:08:43.227734 I | v3rpc/grpc: transport: http2Server.HandleStreams failed to read frame: read tcp 127.0.0.1:2379->127.0.0.1:60230: read: connection reset by peer
2018-01-18 21:08:43.228917 I | v3rpc/grpc: transport: http2Server.HandleStreams failed to read frame: read tcp 127.0.0.1:2379->127.0.0.1:60234: read: connection reset by peer
2018-01-18 21:08:43.241335 I | v3rpc/grpc: transport: http2Server.HandleStreams failed to read frame: read tcp 127.0.0.1:2379->127.0.0.1:60276: read: connection reset by peer
2018-01-18 21:08:43.272092 I | v3rpc/grpc: transport: http2Server.HandleStreams failed to read frame: read tcp 127.0.0.1:2379->127.0.0.1:60362: read: connection reset by peer
2018-01-18 21:08:43.380183 I | v3rpc/grpc: transport: http2Server.HandleStreams failed to read frame: read tcp 127.0.0.1:2379->127.0.0.1:60712: read: connection reset by peer
2018-01-18 21:08:43.499279 I | v3rpc/grpc: transport: http2Server.HandleStreams failed to read frame: read tcp 127.0.0.1:2379->127.0.0.1:32840: read: connection reset by peer
2018-01-18 21:08:43.524769 I | v3rpc/grpc: transport: http2Server.HandleStreams failed to read frame: read tcp 127.0.0.1:2379->127.0.0.1:32904: read: connection reset by peer
2018-01-18 21:08:43.629359 I | rafthttp: peer d98998e6a7a13831 became active
2018-01-18 21:08:43.629392 I | rafthttp: established a TCP streaming connection with peer d98998e6a7a13831 (stream Message reader)
2018-01-18 21:08:43.629973 I | rafthttp: established a TCP streaming connection with peer d98998e6a7a13831 (stream MsgApp v2 reader)
2018-01-18 21:08:43.631156 I | rafthttp: established a TCP streaming connection with peer d98998e6a7a13831 (stream MsgApp v2 writer)
2018-01-18 21:08:43.631357 I | rafthttp: established a TCP streaming connection with peer d98998e6a7a13831 (stream Message writer)
2018-01-18 21:08:43.680305 I | v3rpc/grpc: transport: http2Server.HandleStreams failed to read frame: read tcp 127.0.0.1:2379->127.0.0.1:33370: read: connection reset by peer
2018-01-18 21:08:43.681609 I | v3rpc/grpc: transport: http2Server.HandleStreams failed to read frame: read tcp 127.0.0.1:2379->127.0.0.1:33374: read: connection reset by peer
2018-01-18 21:08:43.713480 I | v3rpc/grpc: transport: http2Server.HandleStreams failed to read frame: read tcp 127.0.0.1:2379->127.0.0.1:33478: read: connection reset by peer
2018-01-18 21:08:43.742455 I | v3rpc/grpc: transport: http2Server.HandleStreams failed to read frame: read tcp 127.0.0.1:2379->127.0.0.1:33560: read: connection reset by peer
2018-01-18 21:08:43.881445 I | v3rpc/grpc: transport: http2Server.HandleStreams failed to read frame: read tcp 127.0.0.1:2379->127.0.0.1:33976: read: connection reset by peer
2018-01-18 21:08:43.900498 I | v3rpc/grpc: transport: http2Server.HandleStreams failed to read frame: read tcp 127.0.0.1:2379->127.0.0.1:34034: read: connection reset by peer
2018-01-18 21:08:43.925009 I | v3rpc/grpc: transport: http2Server.HandleStreams failed to read frame: read tcp 127.0.0.1:2379->127.0.0.1:34112: read: connection reset by peer
2018-01-18 21:08:43.934783 I | v3rpc/grpc: transport: http2Server.HandleStreams failed to read frame: read tcp 127.0.0.1:2379->127.0.0.1:34146: read: connection reset by peer
2018-01-18 21:08:44.030562 I | v3rpc/grpc: transport: http2Server.HandleStreams failed to read frame: read tcp 127.0.0.1:2379->127.0.0.1:34426: read: connection reset by peer
2018-01-18 21:08:44.051082 I | v3rpc/grpc: transport: http2Server.HandleStreams failed to read frame: read tcp 127.0.0.1:2379->127.0.0.1:34488: read: connection reset by peer
2018-01-18 21:08:44.076709 I | v3rpc/grpc: transport: http2Server.HandleStreams failed to read frame: read tcp 127.0.0.1:2379->127.0.0.1:34568: read: connection reset by peer
2018-01-18 21:08:44.078130 I | v3rpc/grpc: transport: http2Server.HandleStreams failed to read frame: read tcp 127.0.0.1:2379->127.0.0.1:34572: read: connection reset by peer
2018-01-18 21:08:44.078860 I | v3rpc/grpc: transport: http2Server.HandleStreams failed to read frame: read tcp 127.0.0.1:2379->127.0.0.1:34574: read: connection reset by peer
2018-01-18 21:08:44.089118 I | v3rpc/grpc: transport: http2Server.HandleStreams failed to read frame: read tcp 127.0.0.1:2379->127.0.0.1:34604: read: connection reset by peer
2018-01-18 21:08:44.117976 I | v3rpc/grpc: transport: http2Server.HandleStreams failed to read frame: read tcp 127.0.0.1:2379->127.0.0.1:34696: read: connection reset by peer
2018-01-18 21:08:44.132960 I | v3rpc/grpc: transport: http2Server.HandleStreams failed to read frame: read tcp 127.0.0.1:2379->127.0.0.1:34738: read: connection reset by peer
2018-01-18 21:08:44.144879 I | v3rpc/grpc: transport: http2Server.HandleStreams failed to read frame: read tcp 127.0.0.1:2379->127.0.0.1:34776: read: connection reset by peer
2018-01-18 21:08:44.151599 I | v3rpc/grpc: transport: http2Server.HandleStreams failed to read frame: read tcp 127.0.0.1:2379->127.0.0.1:34782: read: connection reset by peer
2018-01-18 21:08:44.152799 I | v3rpc/grpc: transport: http2Server.HandleStreams failed to read frame: read tcp 127.0.0.1:2379->127.0.0.1:34786: read: connection reset by peer
2018-01-18 21:08:44.169405 I | v3rpc/grpc: transport: http2Server.HandleStreams failed to read frame: read tcp 127.0.0.1:2379->127.0.0.1:34836: read: connection reset by peer
2018-01-18 21:08:44.177764 I | v3rpc/grpc: transport: http2Server.HandleStreams failed to read frame: read tcp 127.0.0.1:2379->127.0.0.1:34864: read: connection reset by peer
2018-01-18 21:08:44.183164 I | v3rpc/grpc: transport: http2Server.HandleStreams failed to read frame: read tcp 127.0.0.1:2379->127.0.0.1:34876: read: connection reset by peer
2018-01-18 21:08:44.184378 I | v3rpc/grpc: transport: http2Server.HandleStreams failed to read frame: read tcp 127.0.0.1:2379->127.0.0.1:34880: read: connection reset by peer
2018-01-18 21:08:44.188836 I | v3rpc/grpc: transport: http2Server.HandleStreams failed to read frame: read tcp 127.0.0.1:2379->127.0.0.1:34896: read: connection reset by peer
2018-01-18 21:08:44.373573 I | v3rpc/grpc: transport: http2Server.HandleStreams failed to read frame: read tcp 127.0.0.1:2379->127.0.0.1:35454: read: connection reset by peer
2018-01-18 21:08:44.447788 I | v3rpc/grpc: transport: http2Server.HandleStreams failed to read frame: read tcp 127.0.0.1:2379->127.0.0.1:35682: read: connection reset by peer
2018-01-18 21:08:44.459727 I | v3rpc/grpc: transport: http2Server.HandleStreams failed to read frame: read tcp 127.0.0.1:2379->127.0.0.1:35724: read: connection reset by peer
2018-01-18 21:08:44.464944 I | v3rpc/grpc: transport: http2Server.HandleStreams failed to read frame: read tcp 127.0.0.1:2379->127.0.0.1:35738: read: connection reset by peer
2018-01-18 21:08:44.493687 I | v3rpc/grpc: transport: http2Server.HandleStreams failed to read frame: read tcp 127.0.0.1:2379->127.0.0.1:35836: read: connection reset by peer
2018-01-18 21:08:44.496813 I | v3rpc/grpc: transport: http2Server.HandleStreams failed to read frame: read tcp 127.0.0.1:2379->127.0.0.1:35848: read: connection reset by peer
2018-01-18 21:08:44.500040 I | v3rpc/grpc: transport: http2Server.HandleStreams failed to read frame: read tcp 127.0.0.1:2379->127.0.0.1:35862: read: connection reset by peer
2018-01-18 21:08:44.502110 I | v3rpc/grpc: transport: http2Server.HandleStreams failed to read frame: read tcp 127.0.0.1:2379->127.0.0.1:35868: read: connection reset by peer
2018-01-18 21:08:44.546725 I | v3rpc/grpc: transport: http2Server.HandleStreams failed to read frame: read tcp 127.0.0.1:2379->127.0.0.1:36018: read: connection reset by peer
2018-01-18 21:08:44.563149 I | v3rpc/grpc: transport: http2Server.HandleStreams failed to read frame: read tcp 127.0.0.1:2379->127.0.0.1:36052: read: connection reset by peer
2018-01-18 21:08:44.565780 I | v3rpc/grpc: transport: http2Server.HandleStreams failed to read frame: read tcp 127.0.0.1:2379->127.0.0.1:36060: read: connection reset by peer
2018-01-18 21:08:44.568634 I | v3rpc/grpc: transport: http2Server.HandleStreams failed to read frame: read tcp 127.0.0.1:2379->127.0.0.1:36070: read: connection reset by peer
2018-01-18 21:08:44.613707 I | v3rpc/grpc: transport: http2Server.HandleStreams failed to read frame: read tcp 127.0.0.1:2379->127.0.0.1:36218: read: connection reset by peer
2018-01-18 21:08:44.726163 I | v3rpc/grpc: transport: http2Server.HandleStreams failed to read frame: read tcp 127.0.0.1:2379->127.0.0.1:36556: read: connection reset by peer
2018-01-18 21:08:44.728357 I | v3rpc/grpc: transport: http2Server.HandleStreams failed to read frame: read tcp 127.0.0.1:2379->127.0.0.1:36564: read: connection reset by peer
2018-01-18 21:08:44.803674 I | v3rpc/grpc: transport: http2Server.HandleStreams failed to read frame: read tcp 127.0.0.1:2379->127.0.0.1:36802: read: connection reset by peer
2018-01-18 21:08:44.826062 I | v3rpc/grpc: transport: http2Server.HandleStreams failed to read frame: read tcp 127.0.0.1:2379->127.0.0.1:36872: read: connection reset by peer
2018-01-18 21:08:44.914641 I | v3rpc/grpc: transport: http2Server.HandleStreams failed to read frame: read tcp 127.0.0.1:2379->127.0.0.1:37134: read: connection reset by peer
2018-01-18 21:08:45.031013 I | v3rpc/grpc: transport: http2Server.HandleStreams failed to read frame: read tcp 127.0.0.1:2379->127.0.0.1:37482: read: connection reset by peer
2018-01-18 21:08:45.128432 I | v3rpc/grpc: transport: http2Server.HandleStreams failed to read frame: read tcp 127.0.0.1:2379->127.0.0.1:37786: read: connection reset by peer
2018-01-18 21:08:45.134559 I | v3rpc/grpc: transport: http2Server.HandleStreams failed to read frame: read tcp 127.0.0.1:2379->127.0.0.1:37806: read: connection reset by peer
2018-01-18 21:08:45.154051 I | v3rpc/grpc: transport: http2Server.HandleStreams failed to read frame: read tcp 127.0.0.1:2379->127.0.0.1:37864: read: connection reset by peer
2018-01-18 21:08:45.154819 I | v3rpc/grpc: transport: http2Server.HandleStreams failed to read frame: read tcp 127.0.0.1:2379->127.0.0.1:37866: read: connection reset by peer
membership: updated the cluster version from 2.3 to 3.0

How is the client connection initiated?
clientv3.New(clientv3.Config{
Endpoints: []string{"$endPoint"},
DialTimeout: 3 * time.Second,
})

Which API did you try?
/etcdserverpb.Cluster/MemberList (from tcpdump)

What was used to investigate?
tcpdump, strace

Tcpdump content: (This pasted content is just on one packet)
Cmd executed: "tcpdump -qxn -i lo port 2379 -s 0 -w /tmp/tcpdump.log"

40864 5.691315 127.0.0.1 127.0.0.1 TCP 74 33872 → 2379 [SYN] Seq=0 Win=43690 Len=0 MSS=65495 SACK_PERM=1 TSval=4294896844 TSecr=0 WS=128
40865 5.691323 127.0.0.1 127.0.0.1 TCP 74 2379 → 33872 [SYN, ACK] Seq=0 Ack=1 Win=43690 Len=0 MSS=65495 SACK_PERM=1 TSval=4294896844 TSecr=4294896844 WS=128
40866 5.691332 127.0.0.1 127.0.0.1 TCP 66 33872 → 2379 [ACK] Seq=1 Ack=1 Win=43776 Len=0 TSval=4294896844 TSecr=4294896844
40870 5.691442 127.0.0.1 127.0.0.1 HTTP2 90 Magic
40871 5.691447 127.0.0.1 127.0.0.1 TCP 66 2379 → 33872 [ACK] Seq=1 Ack=25 Win=43776 Len=0 TSval=4294896844 TSecr=4294896844
40872 5.691456 127.0.0.1 127.0.0.1 HTTP2 75 SETTINGS
40873 5.691461 127.0.0.1 127.0.0.1 TCP 66 2379 → 33872 [ACK] Seq=1 Ack=34 Win=43776 Len=0 TSval=4294896844 TSecr=4294896844
40874 5.691513 127.0.0.1 127.0.0.1 HTTP2 75 SETTINGS
40875 5.691530 127.0.0.1 127.0.0.1 HTTP2 167 HEADERS, DATA
40876 5.691535 127.0.0.1 127.0.0.1 TCP 66 33872 → 2379 [ACK] Seq=135 Ack=10 Win=43776 Len=0 TSval=4294896844 TSecr=4294896844
40877 5.691572 127.0.0.1 127.0.0.1 HTTP2 75 SETTINGS
40878 5.691581 127.0.0.1 127.0.0.1 HTTP2 79 WINDOW_UPDATE
40879 5.691662 127.0.0.1 127.0.0.1 HTTP2 130 HEADERS
40880 5.691686 127.0.0.1 127.0.0.1 TCP 66 33872 → 2379 [ACK] Seq=144 Ack=87 Win=43776 Len=0 TSval=4294896844 TSecr=4294896844
40881 5.691735 127.0.0.1 127.0.0.1 HTTP2 75 SETTINGS
40885 5.691888 127.0.0.1 127.0.0.1 TCP 66 33872 → 2379 [RST, ACK] Seq=144 Ack=96 Win=43776 Len=0 TSval=4294896844 TSecr=4294896844

In Frame 40875
Header: :path: /etcdserverpb.Cluster/MemberList
Name Length: 5
Name: :path
Value Length: 32
Value: /etcdserverpb.Cluster/MemberList
Representation: Literal Header Field with Incremental Indexing - Indexed Name
Index: 5

In Frame 40879
Header: grpc-status: 14
Name Length: 11
Name: grpc-status
Value Length: 2
Value: 14
Representation: Literal Header Field with Incremental Indexing - New Name
Header: grpc-message: etcdserver: not capable
Name Length: 12
Name: grpc-message
Value Length: 23
Value: etcdserver: not capable
Representation: Literal Header Field with Incremental Indexing - New Name

Strace
Cmd executed: strace -p PID -f -ttT -s200 -o /tmp/stracktrace.log
On client side we saw the close() being made but on the server side it received ECONNRESET. Not sure why this happened but still have to dig into this.

@xiang90
Copy link
Contributor

xiang90 commented Jan 18, 2018

@SaranBalaji90

if you leave the cluster idle without any client interaction, will you still see the error logging? i am asking since the broken streams might be caused by the internal grpc-gateway.

@SaranBalaji90
Copy link

We don't see these logs when cluster is idle.

But @eswarbala pointed out this issue #7065.

I'm invoking Member list API in clientV3 and for all these clusters, api 3.0 is enabled after third node joins. Therefore until then we will see these logs because grpc-go might be retrying since it got status 14 which says service is unavailable.

Does this make sense on why this is happening?

@gyuho
Copy link
Contributor Author

gyuho commented Jan 19, 2018

grpc-go might be retrying since it got status 14 which says service is unavailable.

What was this clientv3 version? Retry logic has changed a lot since 3.0.0.

@SaranBalaji90
Copy link

I built clientv3 by invoking whatever we have here (https://github.com/coreos/etcd/tree/master/clientv3) go get github.com/... should be the latest version.

@gyuho
Copy link
Contributor Author

gyuho commented Jan 19, 2018

Actually for this specific API (MemberList), both 3.0 and latest (3.3+) retry on etcdserver: not capable error. So, it is expected that we will see a bunch of those errors until 3.0 api is enabled.

@xiang90
Copy link
Contributor

xiang90 commented Jan 19, 2018

but why the retry will trigger errors like read: connection reset by peer?

@gyuho
Copy link
Contributor Author

gyuho commented Jan 19, 2018

Simplest solution would fetch /version endpoint until server sets cluster version

// not ready
{"etcdserver":"3.0.0","etcdcluster":"not_decided"}

// ready
{"etcdserver":"3.0.0","etcdcluster":"3.0.0"}

@xiang90
Copy link
Contributor

xiang90 commented Jan 19, 2018

also it seems that the retry is pretty aggressive. for not capable error at least, we should not retry that aggressive, since we know it is because of the cluster is not fully up. retry immediately will not resolve that since it is different than a network error where a immediate retry has a high chance to just work.

@gyuho
Copy link
Contributor Author

gyuho commented Jan 19, 2018

but why the retry will trigger errors like read: connection reset by peer?

Probably from creating/closing a client for every unary RPC call? That error message shouldn't appear from retries with same connection, but I haven't investigated further on this.

@xiang90
Copy link
Contributor

xiang90 commented Jan 19, 2018

closing a client should not generate that error... the tcp connection should be closed cleanly.

@gyuho
Copy link
Contributor Author

gyuho commented Jan 19, 2018

According to grpc/grpc-go#1062 (comment), it's expected, given that we closes gRPC connection properly.

@xiang90
Copy link
Contributor

xiang90 commented Jan 19, 2018

k. it seems gRPC already make that error at debug level.

from our side, we can retry not capable less aggressive, and we do not need to switch to another endpoint for not capable error too.

@gyuho
Copy link
Contributor Author

gyuho commented Jan 19, 2018

Good idea. Opened another issue to track it.

@GlenDC
Copy link

GlenDC commented Jan 23, 2018

It would also be great if we can set the etcd logger which doesn't set the global GRPC logger as this gets really confusing and doesn't make any sense. As now we see logs coming from non-etcd GRPC connections with our etcd database tag, all because the global grpc logger is overwritten by setting the logger via etcd.

Why do you not simply allow to set the logger which is used as middleware? This way the global loggers aren't affected. Or perhaps this is already supported and I simply have to create my ETCD client differently? If not, feel free to provide me with some pointers and I wouldn't mind bringing this support to ETCD.

The logging middleware I was talking about can be found here: https://godoc.org/github.com/grpc-ecosystem/go-grpc-middleware/logging By default they only support GRPC and Zap, but nothing stops you from allowing a user to provide any generic logger I would think.

@SaranBalaji90
Copy link

Also one thing I haven't investigated yet is why we are seeing reset signal on server side when we think we close the client properly (grpc/grpc-go#1062 (comment) as per this comment we should see similiar message whenever client connection is closed but doesn't look like that's the case).

@paulcaskey
Copy link

paulcaskey commented Jun 18, 2020

This is still an issue. This should not be closed, I think: #10364 Zap logger prints connection reset by peer but capnslog doesn't.

It's going to be noticed more, now that v3.4 warns if you DON'T use "zap" logger, and claims the older method is deprecated. I upgraded from 3.3 to 3.4, switched to zap as told, and now my logs are flooded with this message.

Can't this error just be logged at level=debug instead of level=warn?

@paulcaskey
Copy link

The 3.4 FAQ also needs upating on this. Both of these sentences in the answer are now completely wrong? It's logging at "warn" level, not debug.

Only old versions of gRPC log this. etcd >=v3.2.13 by default log this with DEBUG level, thus only visible with --debug flag enabled.

@gyuho
Copy link
Contributor Author

gyuho commented Jun 18, 2020

@paulcaskey Could you create a new issue? Thanks! Nvm. It wasn't closed. Yes, we will take a look at this.

@gyuho gyuho added this to the etcd-v3.5 milestone Jun 18, 2020
@dilyevsky
Copy link

We're seeing this in etcd v3.4.7. More often under heavy load

@serathius
Copy link
Member

Stream logs were disabled.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

7 participants