Skip to content

Improve replication performance after introducing replica ack messages #3070

@PragmaTwice

Description

@PragmaTwice

The last question from me, I'm wondering if the replication has a noticeable slow-down due to a new blocking IO (sendString, take longer for one round in incrementBatchLoopCB since we have IO in the end?) and more network traffic?

TLDR:
Good point, you are right it would have a significant impact (40%) on replication throughput.
I will follow up with a PR to make the ack behavior configurable, so users can balance between WAIT latency and replication throughput themselves.

Full version:
I measure the replication throughput difference by comparing master and slave sequence difference every 5s when I am sending approximately 310MB/s traffic to master (80k QPS with 4kb payload). I notice the unstable branch lag increases by ~84k sequence every 5s (66MB/S) and this branch lag increases by ~220k sequence every 5s (171 MB/s). Therefore, the current unstable branch has a rough replication throughput of 240MB and this branch has throughput of roughly 140MB.

This overhead is beyond my expectation, it may take me a some time to figure out why the perf hit is so bad (btw, I dont think sendString is blocking because it just adds to the buffer?). Meanwhile, I think it makes sense to make the ack behavior configurable. For our usecase, we want to call WAIT on every write for lossless-data replication, so we want faster ack. I understand most of the users don't need this, so an ack every second or a few dozens of updates is good enough for them.

Originally posted by @zhixinwen in #3061 (comment)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions