Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature stats #252

Open
wants to merge 6 commits into
base: master
Choose a base branch
from
Open

Feature stats #252

wants to merge 6 commits into from

Conversation

seanmcevoy
Copy link

This pull request adds some statistics gathering to the riakc client. In our applications we've found that the server side stats didn't provide us with a full picture as NW latency is a significant part of the whole request time.

stats_demo_test gives a quick sanity check of the functionality but here's an example use-case with a snippet of output data from our live system (sensitive info blanked out) to see its real value:

Assuming you have a variable PidList with the pids of all the clients in the system, set the stats level to 2 with:
lists:foreach(fun(Pid) -> riakc_pb_socket:stats_change_level(Pid, 2) end, PidList).

Then after some time grab all the collected stats with:
FullStat = lists:foldl(fun(Pid,StatAcc) -> riakc_stats:merge(StatAcc,riakc_pb_socket:stats_peek(Pid)) end, riakc_pb_socket:stats_peek(hd(PidList)), tl(PidList)).

Then format the raw stats data with:
riakc_stats:print(FullStat).

Riak Connection Stats
connections established: 200
stat monitoring period: 260 s

2 put requests on bucket: <<"bucket1">> timed out while being serviced, timeout: 500.

on 6 occassions the queue was 1 requests long when new requests were added.

1 get requests on bucket: <<"bucket2">> timed out while being serviced, timeout: 500.

24454 put operations on bucket: <<"bucket1">>
with timeout: 500
average service time: 15 ms
7009 between 7 ms and 10 ms
16574 between 10 ms and 20 ms
535 between 20 ms and 40 ms
192 between 40 ms and 70 ms
39 between 70 ms and 100 ms
30 between 100 ms and 200 ms
74 between 200 ms and 400 ms
1 between 400 ms and 700 ms

47329 get operations on bucket: <<"bucket1">>
with timeout: 500
average service time: 13 ms
20718 between 7 ms and 10 ms
25595 between 10 ms and 20 ms
658 between 20 ms and 40 ms
179 between 40 ms and 70 ms
44 between 70 ms and 100 ms
48 between 100 ms and 200 ms
86 between 200 ms and 400 ms
1 between 400 ms and 700 ms

data sent: 19 MB
packets: 119449
average packet size: 165 B
94679 between 20 B and 40 B
274 between 40 B and 70 B
22 between 100 B and 200 B
39 between 200 B and 400 B
17495 between 400 B and 700 B
2009 between 700 B and 1 KB
4794 between 1 KB and 2 KB
83 between 2 KB and 4 KB
54 between 4 KB and 7 KB

data received: 69 MB
packets: 119442
average packet size: 580 B
24792 between 0 B and 1 B
196 between 100 B and 200 B
2620 between 200 B and 400 B
67329 between 400 B and 700 B
3852 between 700 B and 1 KB
20325 between 1 KB and 2 KB
215 between 2 KB and 4 KB
113 between 4 KB and 7 KB

reconnections: 3
average reconnect time: 914 us
2 between 700 us and 1 ms
1 between 1 ms and 2 ms

6 requests got queued with timeout: 1000
average queueing time: 65 ms
1 between 10 ms and 20 ms
2 between 20 ms and 40 ms
1 between 70 ms and 100 ms
2 between 100 ms and 200 ms

@seanmcevoy seanmcevoy mentioned this pull request Nov 20, 2015
Closed
@lukebakken
Copy link
Contributor

@seanmcevoy I just responded to your email. Rest assured, when you comment here, I am notified 😄

If you could address the CI failures, that would be great. I will try to get time scheduled to give your PRs some attention. Thanks again!

@seanmcevoy
Copy link
Author

Cheers Luke, though we won't be upgrading for a while so there's no
short-term pressure from my side.

On Fri, Dec 4, 2015 at 4:12 PM, Luke Bakken [email protected]
wrote:

@seanmcevoy https://github.com/seanmcevoy I just responded to your
email. Rest assured, when you comment here, I am notified.

If you could address the CI failures, that would be great. I will try to
get time scheduled to give your PRs some attention. Thanks again!


Reply to this email directly or view it on GitHub
#252 (comment)
.

@seanmcevoy
Copy link
Author

CI tests passing now after tweaking the tolerances. They're still probabalistic so unfortunately there will be a low failure rate if run repeatedly. Possibly they're a little too high-level to be eunit tests, don't know of any better solutions though.

@lukebakken
Copy link
Contributor

@seanmcevoy - just so you know, we haven't had time to review this yet. I have added it to a future milestone and it won't be lost.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants