This repository was archived by the owner on May 10, 2022. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 24
This repository was archived by the owner on May 10, 2022. It is now read-only.
bug: client can't recover from a replica-server failure. #53
Copy link
Copy link
Open
Labels
bugSomething isn't workingSomething isn't working
Description
-
2019/9/11 17:00. Our SRE stopped one instance of replica-server in our staging environment
trying to simulate the problem java-client can't recover. -
2019/9/11 17:00. Some of our clients recovered right away while replica-server restarted, but some couldn't reconnect and kept retrying with ERR_TIMEOUT error.
2019-09-11 17:12:48,659 ERROR [nioEventLoopGroup-4-1] [com.xiaomi.message.mixin.fsm.ruleset.SaveAckRuleSet$3$1.operationComplete(SaveAckRuleSet.java:169)] - SaveB2CAckInfo fail, appId=2882303761517479657 msgId=sms-1c33f956-7dad-4327-8986-aa400cb4a8b2, ex:
com.xiaomi.infra.pegasus.client.PException: com.xiaomi.infra.pegasus.rpc.ReplicationException: ERR_TIMEOUT
at com.xiaomi.infra.pegasus.client.PegasusTable$8.onCompletion(PegasusTable.java:376)
at com.xiaomi.infra.pegasus.rpc.async.ClientRequestRound.thisRoundCompletion(ClientRequestRound.java:51)
at com.xiaomi.infra.pegasus.rpc.async.TableHandler.tryDelayCall(TableHandler.java:314)
at com.xiaomi.infra.pegasus.rpc.async.TableHandler.onRpcReply(TableHandler.java:295)
at com.xiaomi.infra.pegasus.rpc.async.TableHandler$3.run(TableHandler.java:326)
at com.xiaomi.infra.pegasus.rpc.async.ReplicaSession.tryNotifyWithSequenceID(ReplicaSession.java:226)
at com.xiaomi.infra.pegasus.rpc.async.ReplicaSession.access$300(ReplicaSession.java:32)
at com.xiaomi.infra.pegasus.rpc.async.ReplicaSession$4.run(ReplicaSession.java:270)
at io.netty.util.concurrent.PromiseTask$RunnableAdapter.call(PromiseTask.java:38)
at io.netty.util.concurrent.ScheduledFutureTask.run(ScheduledFutureTask.java:123)
at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:380)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357)
at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:116)
at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:137)
at java.lang.Thread.run(Thread.java:748)
Caused by: com.xiaomi.infra.pegasus.rpc.ReplicationException: ERR_TIMEOUT
... 15 more
- 2019-09-11 17:12:48, the errors remained, we stopped the test.
Client Version
1.11.5-thrift-0.11.0-inlined-release
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working