Statistics desired #76

gdamore · 2015-02-04T07:43:02Z

It may be nice to have methods to track statistics. There are probably a great number of potentially useful stats. We need to think about them.

nkev · 2016-01-09T23:52:47Z

+1 I also cannot find any benchmarks compared to nanomsg or any other messaging platform.

gdamore · 2016-05-25T03:38:39Z

Just a note @nkev -- we do have the local_lat, remote_lat, local_thr, and remote_thr utilities that nanomsg has. You can run them against each other -- or you can run them against nanomsg. (There are four permutations for each of these .) As far as just benchmarking locally, you can also use the Go benchmark facility in the test subdirectory. There are throughput and latency tests there. Obviously you cannot really compare the results against anything else, but it may still be useful. (For evaluating local changes to code and their impact on performance for example.)

eelcocramer · 2017-02-07T14:47:55Z

I've added mangos to an existing message benchmark that I stumbled upon. It compares benchmarks of python and golang messaging solution. For golang it compares zmq, redis and now mangos. You can find my code here:

https://github.com/eelcocramer/two-queues

I'm not sure if a made it as efficient as possible but currently mangos does not come even close to zmq and redis.

update removed the graph as the figures are reliable.

Any hints to improve the performance of my code?

gdamore · 2017-02-07T17:38:38Z

Looking at this briefly, you’re not making use of the mangos.Message structure, so you’re hitting the garbage collection logic really hard. You don’t suffer this penalty for the native zmq and redis code. I haven’t really done much more analysis than that. The libzmq and redis versions are written in C, and don’t make use of dynamic data at all, which gives them excellent performance. It might also be worth playing with the write/read QLengths. Otherwise I need to spend more time looking at this — certainly I get the same terrible results, and I haven’t had time yet to fully digest this. One other thing that I suspect might be hurting us is go’s default TCP settings. I would also avoid changing the runtime’s choice for GOMAXPROCS… the modern Go runtime does a good job of selecting this automatically. One thing is that I’m seeing that you get much smaller numbers of messages pushed through than I do using the performance test suite (local_thr and remote_thr). It might be interesting to compare this to ZMQ — I think ZMQ has similar test suites. That said, it looks like mangos doesn’t come close to ever achieving the redis numbers. Probably it would be worth profiling this code.

…

On Tue, Feb 7, 2017 at 6:47 AM, Eelco ***@***.***> wrote: I've added mangos to an existing message benchmark <http://blog.jupo.org/2013/02/23/a-tale-of-two-queues/> that I found online. It compares benchmarks of python and golang messaging solution. For golang it compares zmq, redis and now mangos. You can find the code here: https://github.com/eelcocramer/two-queues I'm not sure if a made it as efficient as possible but currently mangos does not come even close to zmq and redis. [image: two-queues-5] <https://cloud.githubusercontent.com/assets/795579/22695938/c5942fe0-ed4c-11e6-816c-7ce4fb773ce4.png> Any hints to improve the performance of my code? — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <https://github.com/go-mangos/mangos/issues/67#issuecomment-278020829>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABPDfQkjXV1P-OMnuGmh24hKUM8oGyk8ks5raIQfgaJpZM4Dbb1W> .

eelcocramer · 2017-02-07T18:17:54Z

Thank you for the response and the solution pointers. I'll dig into this and post results back.

I will remove the graph because it probably does not good reliable information.

gdamore · 2017-02-07T18:33:33Z

I just did a little more investigation. ZeroMQ (and possibly Redis) are queuing messages, and do not exert backpressure by default. (The default high water mark in ZEROMQ is 0.) This means that you can send as fast as you can, and you’re limited only by the rate that you can allocate message buffers. With mangos, we have a situation where there is backpressure involved — the default is to only queue up to 128 messages before exerting backpressure. Experimentally this is found to be a good limit for most cases. In the push/pull case we see backpressure exerted, and the push routine backups. In zmq’s case we just spin hard. Pub/Sub is a bit harder, because that’s best effort delivery, so we actually *drop* messages if we try to send too fast. Benchmarking pub/sub is harder as a result, since messages will be dropped when you push too hard. (Conversely, ZMQ doesn’t drop, but your program’s heap usage may grow without bound. I consider this unacceptable in a production setting, and utterly useless from a benchmarking perspective.) The problem is that I’m not convinced, having looked at the code, that we’re actually measuring message throughput at all; we may just be measuring how fast certain routines can be called in a loop. In other words, I think the benchmark collection methods are flawed. For a pub/sub benchmark, I’d actually use a pair of peers, pubA->subB->pubB->subA — basically forwarding the message back. You could use this to measure both latency and single thread msgs/sec. As you increase the number of concurrent clients you will hit a threshold where message drops occur; certainly at 128 (the default queue depth) the clients should outpace the server and you should see drops. You may see them faster than that depending on how fast we can pull messages from the underlying transport. I haven’t done any concrete experimentation yet.

…

On Tue, Feb 7, 2017 at 9:38 AM, Garrett D'Amore ***@***.***> wrote: Looking at this briefly, you’re not making use of the mangos.Message structure, so you’re hitting the garbage collection logic really hard. You don’t suffer this penalty for the native zmq and redis code. I haven’t really done much more analysis than that. The libzmq and redis versions are written in C, and don’t make use of dynamic data at all, which gives them excellent performance. It might also be worth playing with the write/read QLengths. Otherwise I need to spend more time looking at this — certainly I get the same terrible results, and I haven’t had time yet to fully digest this. One other thing that I suspect might be hurting us is go’s default TCP settings. I would also avoid changing the runtime’s choice for GOMAXPROCS… the modern Go runtime does a good job of selecting this automatically. One thing is that I’m seeing that you get much smaller numbers of messages pushed through than I do using the performance test suite (local_thr and remote_thr). It might be interesting to compare this to ZMQ — I think ZMQ has similar test suites. That said, it looks like mangos doesn’t come close to ever achieving the redis numbers. Probably it would be worth profiling this code. On Tue, Feb 7, 2017 at 6:47 AM, Eelco ***@***.***> wrote: > I've added mangos to an existing message benchmark > <http://blog.jupo.org/2013/02/23/a-tale-of-two-queues/> that I found > online. It compares benchmarks of python and golang messaging solution. For > golang it compares zmq, redis and now mangos. You can find the code here: > > https://github.com/eelcocramer/two-queues > > I'm not sure if a made it as efficient as possible but currently mangos > does not come even close to zmq and redis. > > [image: two-queues-5] > <https://cloud.githubusercontent.com/assets/795579/22695938/c5942fe0-ed4c-11e6-816c-7ce4fb773ce4.png> > > Any hints to improve the performance of my code? > > — > You are receiving this because you authored the thread. > Reply to this email directly, view it on GitHub > <https://github.com/go-mangos/mangos/issues/67#issuecomment-278020829>, > or mute the thread > <https://github.com/notifications/unsubscribe-auth/ABPDfQkjXV1P-OMnuGmh24hKUM8oGyk8ks5raIQfgaJpZM4Dbb1W> > . >

eelcocramer · 2017-02-07T19:11:30Z

Thanks for the extensive answer.

gdamore · 2017-02-07T20:42:17Z

So I made a test that measures pub/sub round trips. TCP is the limiter. With inproc using mangos, I can get about 720k rtt/sec with 8 clients, and 760k rtt/sec with 16 clients. Adding more clients helps. With just a single client using inproc I get 128k rtt/sec. This is a serial send/recv loop, really measuring round trips. (A publishes to B, and B publishes back to A; A and B each use two sockets.) I’m using mangos.NewMessage() to avoid the garbage collector in Go. This is also on my 2013 iMac, using go 1.7.1, and no adjustments to GOMAXPROCS. It seems that diminishing routines hit somewhere around 16 concurrent clients. Note that I’m not experiencing any losses, and I’m using no special tuning options in mangos. With TCP, using a single client, I get 18k rtt/sec, with 16 clients its about 43k rtt/sec. This is over loopback on my mac. That corresponds to 86k *messages* per second. A bit lower than I’d like, but again this is completely untuned. (Interestingly enough, the code stalls at around 44k rtt/sec at 32 clients.) Bumping up GOMAXPROCS seems to help further, getting me to about 57k rtt/sec. (114k msgs/sec.) There are some enhancements that can be made for sure to improve the rtt performance. For example, the messages are sent using two Write() operations, leading to two TCP segments per message. This is really bad for performance, and needs to be fixed. I probably should take some effort to make sure that TCP nodelay is set. I think that’s missing at present. I’m fairly confident that your test results are skewed due to incorrect measurements. The fact that the low client count results in extraordinarily high messages/secs really does tell me quite clearly that you’re not measuring actual transit times. Indeed, mangos performs *better* as you increase the client count, since you wind up shoveling more data and having threads spend less time waiting (parallelization wins). The fact that this is not true for redis or zmq tells me that either they have horribly broken concurrency (they don’t), or your test methodology is busted. I’m not prepared to go into detail about the test quality, but here’s the code I used for mangos validation with pub/sub. Feel free to adjust as you like. You can change the addresses, and the clients and loop variable. You do have to add the various client results together, but what I did was just take the middle results and multiply by the client count. Not extremely precise, but I think its pretty close to accurate especially for sufficiently long runs.

…

On Tue, Feb 7, 2017 at 10:33 AM, Garrett D'Amore ***@***.***> wrote: I just did a little more investigation. ZeroMQ (and possibly Redis) are queuing messages, and do not exert backpressure by default. (The default high water mark in ZEROMQ is 0.) This means that you can send as fast as you can, and you’re limited only by the rate that you can allocate message buffers. With mangos, we have a situation where there is backpressure involved — the default is to only queue up to 128 messages before exerting backpressure. Experimentally this is found to be a good limit for most cases. In the push/pull case we see backpressure exerted, and the push routine backups. In zmq’s case we just spin hard. Pub/Sub is a bit harder, because that’s best effort delivery, so we actually *drop* messages if we try to send too fast. Benchmarking pub/sub is harder as a result, since messages will be dropped when you push too hard. (Conversely, ZMQ doesn’t drop, but your program’s heap usage may grow without bound. I consider this unacceptable in a production setting, and utterly useless from a benchmarking perspective.) The problem is that I’m not convinced, having looked at the code, that we’re actually measuring message throughput at all; we may just be measuring how fast certain routines can be called in a loop. In other words, I think the benchmark collection methods are flawed. For a pub/sub benchmark, I’d actually use a pair of peers, pubA->subB->pubB->subA — basically forwarding the message back. You could use this to measure both latency and single thread msgs/sec. As you increase the number of concurrent clients you will hit a threshold where message drops occur; certainly at 128 (the default queue depth) the clients should outpace the server and you should see drops. You may see them faster than that depending on how fast we can pull messages from the underlying transport. I haven’t done any concrete experimentation yet. On Tue, Feb 7, 2017 at 9:38 AM, Garrett D'Amore ***@***.***> wrote: > Looking at this briefly, you’re not making use of the mangos.Message > structure, so you’re hitting the garbage collection logic really hard. > You don’t suffer this penalty for the native zmq and redis code. I haven’t > really done much more analysis than that. The libzmq and redis versions > are written in C, and don’t make use of dynamic data at all, which gives > them excellent performance. It might also be worth playing with the > write/read QLengths. > > Otherwise I need to spend more time looking at this — certainly I get the > same terrible results, and I haven’t had time yet to fully digest this. > One other thing that I suspect might be hurting us is go’s default TCP > settings. > > I would also avoid changing the runtime’s choice for GOMAXPROCS… the > modern Go runtime does a good job of selecting this automatically. > > One thing is that I’m seeing that you get much smaller numbers of > messages pushed through than I do using the performance test suite > (local_thr and remote_thr). It might be interesting to compare this to ZMQ > — I think ZMQ has similar test suites. That said, it looks like mangos > doesn’t come close to ever achieving the redis numbers. > > Probably it would be worth profiling this code. > > On Tue, Feb 7, 2017 at 6:47 AM, Eelco ***@***.***> wrote: > >> I've added mangos to an existing message benchmark >> <http://blog.jupo.org/2013/02/23/a-tale-of-two-queues/> that I found >> online. It compares benchmarks of python and golang messaging solution. For >> golang it compares zmq, redis and now mangos. You can find the code here: >> >> https://github.com/eelcocramer/two-queues >> >> I'm not sure if a made it as efficient as possible but currently mangos >> does not come even close to zmq and redis. >> >> [image: two-queues-5] >> <https://cloud.githubusercontent.com/assets/795579/22695938/c5942fe0-ed4c-11e6-816c-7ce4fb773ce4.png> >> >> Any hints to improve the performance of my code? >> >> — >> You are receiving this because you authored the thread. >> Reply to this email directly, view it on GitHub >> <https://github.com/go-mangos/mangos/issues/67#issuecomment-278020829>, >> or mute the thread >> <https://github.com/notifications/unsubscribe-auth/ABPDfQkjXV1P-OMnuGmh24hKUM8oGyk8ks5raIQfgaJpZM4Dbb1W> >> . >> > >

gdamore · 2017-02-07T20:42:54Z

package main

import (
	"fmt"
	"sync"
	"time"

	"github.com/go-mangos/mangos"
	"github.com/go-mangos/mangos/protocol/pub"
	"github.com/go-mangos/mangos/protocol/sub"
	"github.com/go-mangos/mangos/transport/tcp"
	"github.com/go-mangos/mangos/transport/inproc"
)

var addr1 = "tcp://127.0.0.1:4455"
var addr2 = "tcp://127.0.0.1:4456"
//var addr1 = "inproc://127.0.0.1:4455"
//var addr2 = "inproc://127.0.0.1:4456"

func client(loops int) {
	p, e := pub.NewSocket()
	if e != nil {
		panic(e.Error())
	}
	defer p.Close()
	s, e := sub.NewSocket()
	if e != nil {
		panic(e.Error())
	}
	defer s.Close()

	p.AddTransport(tcp.NewTransport())
	s.AddTransport(tcp.NewTransport())
	p.AddTransport(inproc.NewTransport())
	s.AddTransport(inproc.NewTransport())

	s.SetOption(mangos.OptionSubscribe, []byte{})

	if e = p.Dial(addr1); e != nil {
		panic(e.Error())
	}
	if e = s.Dial(addr2); e != nil {
		panic(e.Error())
	}

	msg := mangos.NewMessage(8)
	msg.Body = append(msg.Body, []byte("hello")...)
	now := time.Now()

	time.Sleep(time.Millisecond * 100)

	for i := 0; i < loops; i++ {
		if e = p.SendMsg(msg); e != nil {
			panic(e.Error())
		}
		if msg, e = s.RecvMsg(); e != nil {
			panic(e.Error())
		}
	}
	end := time.Now()
	delta := float64(end.Sub(now))/float64(time.Second)

	fmt.Printf("Client %d RTTs in %f secs (%f rtt/sec)\n",
		loops, delta, float64(loops)/delta);
}

func server() {
	p, e := pub.NewSocket()
	if e != nil {
		panic(e.Error())
	}
	defer p.Close()
	s, e := sub.NewSocket()
	if e != nil {
		panic(e.Error())
	}
	defer s.Close()
	p.AddTransport(tcp.NewTransport())
	s.AddTransport(tcp.NewTransport())
	s.AddTransport(inproc.NewTransport())
	p.AddTransport(inproc.NewTransport())
	s.SetOption(mangos.OptionSubscribe, []byte{})

	s.Listen(addr1)
	p.Listen(addr2)
	
	for {
		msg, e := s.RecvMsg()
		if e != nil {
			println(e.Error())
			return
		}
		e = p.SendMsg(msg)
		if e != nil {
			println(e.Error())
			return
		}
	}
}

func main() {
	clients := 32
	loops := 10000

	go server()
	time.Sleep(time.Millisecond*100)

	wg := sync.WaitGroup{}
	wg.Add(clients)

	for i := 0; i < clients; i++ {
		go func() {
			defer wg.Done()
			client(loops)
		}()
	}

	wg.Wait()
}

gdamore · 2017-02-07T20:47:31Z

I see that I actually added a 100 ms delay inside the initial timing, so my numbers are 100 ms worse than they should be. But for large message counts that should amortize into the noise. Still should fix that....

gdamore · 2017-02-07T20:48:17Z

The sleep is there to ensure that the connections are established. It can take several dozen milliseconds for TCP connections and the go routines on both sides to establish.

gdamore · 2018-10-19T03:59:44Z

statistics support will be a mangos v2 deliverable. Probably following along with NNG in that regard.

BHare1985 · 2024-03-22T19:05:38Z

Did this every make v2, if not, is there something on the roadmap?
I am curious about using mangos and writing my own pub/sub system in order to reduce code complexity and maintainability with KISS but would be interested to know how it benchmarks/performs with very little helper code and design around throughput

gdamore · 2024-03-22T20:37:28Z

It did not. Problem here is too many projects and not enough time.

If this is important enough to you to sponsor (or to contribute code) let me know.

gdamore transferred this issue from nanomsg/mangos-v1 Nov 3, 2018

gdamore mentioned this issue Dec 24, 2019

Statistics desired #43

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Statistics desired #76

Statistics desired #76

gdamore commented Feb 4, 2015

nkev commented Jan 9, 2016

gdamore commented May 25, 2016

eelcocramer commented Feb 7, 2017 •

edited

Loading

gdamore commented Feb 7, 2017 via email

eelcocramer commented Feb 7, 2017

gdamore commented Feb 7, 2017 via email

eelcocramer commented Feb 7, 2017

gdamore commented Feb 7, 2017 via email •

edited

Loading

gdamore commented Feb 7, 2017

gdamore commented Feb 7, 2017

gdamore commented Feb 7, 2017

gdamore commented Oct 19, 2018

BHare1985 commented Mar 22, 2024

gdamore commented Mar 22, 2024

Statistics desired #76

Statistics desired #76

Comments

gdamore commented Feb 4, 2015

nkev commented Jan 9, 2016

gdamore commented May 25, 2016

eelcocramer commented Feb 7, 2017 • edited Loading

gdamore commented Feb 7, 2017 via email

eelcocramer commented Feb 7, 2017

gdamore commented Feb 7, 2017 via email

eelcocramer commented Feb 7, 2017

gdamore commented Feb 7, 2017 via email • edited Loading

gdamore commented Feb 7, 2017

gdamore commented Feb 7, 2017

gdamore commented Feb 7, 2017

gdamore commented Oct 19, 2018

BHare1985 commented Mar 22, 2024

gdamore commented Mar 22, 2024

eelcocramer commented Feb 7, 2017 •

edited

Loading

gdamore commented Feb 7, 2017 via email •

edited

Loading