-
-
Notifications
You must be signed in to change notification settings - Fork 79
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Statistics desired #76
Comments
+1 I also cannot find any benchmarks compared to nanomsg or any other messaging platform. |
Just a note @nkev -- we do have the local_lat, remote_lat, local_thr, and remote_thr utilities that nanomsg has. You can run them against each other -- or you can run them against nanomsg. (There are four permutations for each of these .) As far as just benchmarking locally, you can also use the Go benchmark facility in the test subdirectory. There are throughput and latency tests there. Obviously you cannot really compare the results against anything else, but it may still be useful. (For evaluating local changes to code and their impact on performance for example.) |
I've added mangos to an existing message benchmark that I stumbled upon. It compares benchmarks of python and golang messaging solution. For golang it compares zmq, redis and now mangos. You can find my code here: https://github.com/eelcocramer/two-queues I'm not sure if a made it as efficient as possible but currently mangos does not come even close to zmq and redis. update removed the graph as the figures are reliable. Any hints to improve the performance of my code? |
Looking at this briefly, you’re not making use of the mangos.Message
structure, so you’re hitting the garbage collection logic really hard.
You don’t suffer this penalty for the native zmq and redis code. I haven’t
really done much more analysis than that. The libzmq and redis versions
are written in C, and don’t make use of dynamic data at all, which gives
them excellent performance. It might also be worth playing with the
write/read QLengths.
Otherwise I need to spend more time looking at this — certainly I get the
same terrible results, and I haven’t had time yet to fully digest this.
One other thing that I suspect might be hurting us is go’s default TCP
settings.
I would also avoid changing the runtime’s choice for GOMAXPROCS… the modern
Go runtime does a good job of selecting this automatically.
One thing is that I’m seeing that you get much smaller numbers of messages
pushed through than I do using the performance test suite (local_thr and
remote_thr). It might be interesting to compare this to ZMQ — I think ZMQ
has similar test suites. That said, it looks like mangos doesn’t come
close to ever achieving the redis numbers.
Probably it would be worth profiling this code.
…On Tue, Feb 7, 2017 at 6:47 AM, Eelco ***@***.***> wrote:
I've added mangos to an existing message benchmark
<http://blog.jupo.org/2013/02/23/a-tale-of-two-queues/> that I found
online. It compares benchmarks of python and golang messaging solution. For
golang it compares zmq, redis and now mangos. You can find the code here:
https://github.com/eelcocramer/two-queues
I'm not sure if a made it as efficient as possible but currently mangos
does not come even close to zmq and redis.
[image: two-queues-5]
<https://cloud.githubusercontent.com/assets/795579/22695938/c5942fe0-ed4c-11e6-816c-7ce4fb773ce4.png>
Any hints to improve the performance of my code?
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<https://github.com/go-mangos/mangos/issues/67#issuecomment-278020829>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABPDfQkjXV1P-OMnuGmh24hKUM8oGyk8ks5raIQfgaJpZM4Dbb1W>
.
|
Thank you for the response and the solution pointers. I'll dig into this and post results back. I will remove the graph because it probably does not good reliable information. |
I just did a little more investigation.
ZeroMQ (and possibly Redis) are queuing messages, and do not exert
backpressure by default. (The default high water mark in ZEROMQ is 0.)
This means that you can send as fast as you can, and you’re limited only
by the rate that you can allocate message buffers.
With mangos, we have a situation where there is backpressure involved — the
default is to only queue up to 128 messages before exerting backpressure.
Experimentally this is found to be a good limit for most cases. In the
push/pull case we see backpressure exerted, and the push routine backups.
In zmq’s case we just spin hard. Pub/Sub is a bit harder, because that’s
best effort delivery, so we actually *drop* messages if we try to send too
fast. Benchmarking pub/sub is harder as a result, since messages will be
dropped when you push too hard. (Conversely, ZMQ doesn’t drop, but your
program’s heap usage may grow without bound. I consider this unacceptable
in a production setting, and utterly useless from a benchmarking
perspective.)
The problem is that I’m not convinced, having looked at the code, that
we’re actually measuring message throughput at all; we may just be
measuring how fast certain routines can be called in a loop. In other
words, I think the benchmark collection methods are flawed.
For a pub/sub benchmark, I’d actually use a pair of peers,
pubA->subB->pubB->subA — basically forwarding the message back. You could
use this to measure both latency and single thread msgs/sec. As you
increase the number of concurrent clients you will hit a threshold where
message drops occur; certainly at 128 (the default queue depth) the clients
should outpace the server and you should see drops. You may see them
faster than that depending on how fast we can pull messages from the
underlying transport. I haven’t done any concrete experimentation yet.
…On Tue, Feb 7, 2017 at 9:38 AM, Garrett D'Amore ***@***.***> wrote:
Looking at this briefly, you’re not making use of the mangos.Message
structure, so you’re hitting the garbage collection logic really hard.
You don’t suffer this penalty for the native zmq and redis code. I haven’t
really done much more analysis than that. The libzmq and redis versions
are written in C, and don’t make use of dynamic data at all, which gives
them excellent performance. It might also be worth playing with the
write/read QLengths.
Otherwise I need to spend more time looking at this — certainly I get the
same terrible results, and I haven’t had time yet to fully digest this.
One other thing that I suspect might be hurting us is go’s default TCP
settings.
I would also avoid changing the runtime’s choice for GOMAXPROCS… the
modern Go runtime does a good job of selecting this automatically.
One thing is that I’m seeing that you get much smaller numbers of messages
pushed through than I do using the performance test suite (local_thr and
remote_thr). It might be interesting to compare this to ZMQ — I think ZMQ
has similar test suites. That said, it looks like mangos doesn’t come
close to ever achieving the redis numbers.
Probably it would be worth profiling this code.
On Tue, Feb 7, 2017 at 6:47 AM, Eelco ***@***.***> wrote:
> I've added mangos to an existing message benchmark
> <http://blog.jupo.org/2013/02/23/a-tale-of-two-queues/> that I found
> online. It compares benchmarks of python and golang messaging solution. For
> golang it compares zmq, redis and now mangos. You can find the code here:
>
> https://github.com/eelcocramer/two-queues
>
> I'm not sure if a made it as efficient as possible but currently mangos
> does not come even close to zmq and redis.
>
> [image: two-queues-5]
> <https://cloud.githubusercontent.com/assets/795579/22695938/c5942fe0-ed4c-11e6-816c-7ce4fb773ce4.png>
>
> Any hints to improve the performance of my code?
>
> —
> You are receiving this because you authored the thread.
> Reply to this email directly, view it on GitHub
> <https://github.com/go-mangos/mangos/issues/67#issuecomment-278020829>,
> or mute the thread
> <https://github.com/notifications/unsubscribe-auth/ABPDfQkjXV1P-OMnuGmh24hKUM8oGyk8ks5raIQfgaJpZM4Dbb1W>
> .
>
|
Thanks for the extensive answer. |
So I made a test that measures pub/sub round trips. TCP is the limiter.
With inproc using mangos, I can get about 720k rtt/sec with 8 clients, and
760k rtt/sec with 16 clients. Adding more clients helps. With just a
single client using inproc I get 128k rtt/sec. This is a serial send/recv
loop, really measuring round trips. (A publishes to B, and B publishes
back to A; A and B each use two sockets.) I’m using mangos.NewMessage() to
avoid the garbage collector in Go. This is also on my 2013 iMac, using go
1.7.1, and no adjustments to GOMAXPROCS. It seems that diminishing
routines hit somewhere around 16 concurrent clients. Note that I’m not
experiencing any losses, and I’m using no special tuning options in mangos.
With TCP, using a single client, I get 18k rtt/sec, with 16 clients its
about 43k rtt/sec. This is over loopback on my mac. That corresponds to
86k *messages* per second. A bit lower than I’d like, but again this is
completely untuned. (Interestingly enough, the code stalls at around 44k
rtt/sec at 32 clients.) Bumping up GOMAXPROCS seems to help further,
getting me to about 57k rtt/sec. (114k msgs/sec.)
There are some enhancements that can be made for sure to improve the rtt
performance. For example, the messages are sent using two Write()
operations, leading to two TCP segments per message. This is really bad
for performance, and needs to be fixed. I probably should take some effort
to make sure that TCP nodelay is set. I think that’s missing at present.
I’m fairly confident that your test results are skewed due to incorrect
measurements.
The fact that the low client count results in extraordinarily high
messages/secs really does tell me quite clearly that you’re not measuring
actual transit times. Indeed, mangos performs *better* as you increase the
client count, since you wind up shoveling more data and having threads
spend less time waiting (parallelization wins). The fact that this is not
true for redis or zmq tells me that either they have horribly broken
concurrency (they don’t), or your test methodology is busted.
I’m not prepared to go into detail about the test quality, but here’s the
code I used for mangos validation with pub/sub. Feel free to adjust as you
like. You can change the addresses, and the clients and loop variable.
You do have to add the various client results together, but what I did was
just take the middle results and multiply by the client count. Not
extremely precise, but I think its pretty close to accurate especially for
sufficiently long runs.
…On Tue, Feb 7, 2017 at 10:33 AM, Garrett D'Amore ***@***.***> wrote:
I just did a little more investigation.
ZeroMQ (and possibly Redis) are queuing messages, and do not exert
backpressure by default. (The default high water mark in ZEROMQ is 0.)
This means that you can send as fast as you can, and you’re limited only
by the rate that you can allocate message buffers.
With mangos, we have a situation where there is backpressure involved —
the default is to only queue up to 128 messages before exerting
backpressure. Experimentally this is found to be a good limit for most
cases. In the push/pull case we see backpressure exerted, and the push
routine backups. In zmq’s case we just spin hard. Pub/Sub is a bit
harder, because that’s best effort delivery, so we actually *drop* messages
if we try to send too fast. Benchmarking pub/sub is harder as a result,
since messages will be dropped when you push too hard. (Conversely, ZMQ
doesn’t drop, but your program’s heap usage may grow without bound. I
consider this unacceptable in a production setting, and utterly useless
from a benchmarking perspective.)
The problem is that I’m not convinced, having looked at the code, that
we’re actually measuring message throughput at all; we may just be
measuring how fast certain routines can be called in a loop. In other
words, I think the benchmark collection methods are flawed.
For a pub/sub benchmark, I’d actually use a pair of peers,
pubA->subB->pubB->subA — basically forwarding the message back. You could
use this to measure both latency and single thread msgs/sec. As you
increase the number of concurrent clients you will hit a threshold where
message drops occur; certainly at 128 (the default queue depth) the clients
should outpace the server and you should see drops. You may see them
faster than that depending on how fast we can pull messages from the
underlying transport. I haven’t done any concrete experimentation yet.
On Tue, Feb 7, 2017 at 9:38 AM, Garrett D'Amore ***@***.***>
wrote:
> Looking at this briefly, you’re not making use of the mangos.Message
> structure, so you’re hitting the garbage collection logic really hard.
> You don’t suffer this penalty for the native zmq and redis code. I haven’t
> really done much more analysis than that. The libzmq and redis versions
> are written in C, and don’t make use of dynamic data at all, which gives
> them excellent performance. It might also be worth playing with the
> write/read QLengths.
>
> Otherwise I need to spend more time looking at this — certainly I get the
> same terrible results, and I haven’t had time yet to fully digest this.
> One other thing that I suspect might be hurting us is go’s default TCP
> settings.
>
> I would also avoid changing the runtime’s choice for GOMAXPROCS… the
> modern Go runtime does a good job of selecting this automatically.
>
> One thing is that I’m seeing that you get much smaller numbers of
> messages pushed through than I do using the performance test suite
> (local_thr and remote_thr). It might be interesting to compare this to ZMQ
> — I think ZMQ has similar test suites. That said, it looks like mangos
> doesn’t come close to ever achieving the redis numbers.
>
> Probably it would be worth profiling this code.
>
> On Tue, Feb 7, 2017 at 6:47 AM, Eelco ***@***.***> wrote:
>
>> I've added mangos to an existing message benchmark
>> <http://blog.jupo.org/2013/02/23/a-tale-of-two-queues/> that I found
>> online. It compares benchmarks of python and golang messaging solution. For
>> golang it compares zmq, redis and now mangos. You can find the code here:
>>
>> https://github.com/eelcocramer/two-queues
>>
>> I'm not sure if a made it as efficient as possible but currently mangos
>> does not come even close to zmq and redis.
>>
>> [image: two-queues-5]
>> <https://cloud.githubusercontent.com/assets/795579/22695938/c5942fe0-ed4c-11e6-816c-7ce4fb773ce4.png>
>>
>> Any hints to improve the performance of my code?
>>
>> —
>> You are receiving this because you authored the thread.
>> Reply to this email directly, view it on GitHub
>> <https://github.com/go-mangos/mangos/issues/67#issuecomment-278020829>,
>> or mute the thread
>> <https://github.com/notifications/unsubscribe-auth/ABPDfQkjXV1P-OMnuGmh24hKUM8oGyk8ks5raIQfgaJpZM4Dbb1W>
>> .
>>
>
>
|
|
I see that I actually added a 100 ms delay inside the initial timing, so my numbers are 100 ms worse than they should be. But for large message counts that should amortize into the noise. Still should fix that.... |
The sleep is there to ensure that the connections are established. It can take several dozen milliseconds for TCP connections and the go routines on both sides to establish. |
statistics support will be a mangos v2 deliverable. Probably following along with NNG in that regard. |
Did this every make v2, if not, is there something on the roadmap? |
It did not. Problem here is too many projects and not enough time. If this is important enough to you to sponsor (or to contribute code) let me know. |
It may be nice to have methods to track statistics. There are probably a great number of potentially useful stats. We need to think about them.
The text was updated successfully, but these errors were encountered: