Replies: 1 comment 3 replies
-
There is the standard stuff recommended in the docs - https://docs.riak.com/riak/kv/2.2.3/using/reference/statistics-monitoring.1.html. If you have CPU, tracking where CPU is being spent within Riak can be hard. You may be able to glean something by looking at reductions in etop. Also looking at the msg_q may indicate where bottlenecks are. If you have more CPU cores than vnodes on each node, I would generally not expect to see CPU maxing out. The things that can cause higher CPU are:
This isn't an exhaustive list. Most of the load testing done post-basho, as part of releases, tends to prove disk-bound workloads, so high CPU only really occurs due to increasing io wait times. We saw increases sys cpu times after meltdown/spectre fixes. We tend to see reduced CPU utilisation with Riak 3.0. There are some OS tuning guides in the docs, but most of the recommendations don't make a huge difference on their own other than disabling transparent huge pages. There's a lot of "it depends" as well. Hardware choices, access profiles, number and size of objects, backend choice etc. Normally, the default answer is to expand the cluster first, as that should usually provide CPU relief while you try and work out what is happening. |
Beta Was this translation helpful? Give feedback.
-
I'm faced with RIAK 2.0 nodes that have started to have high CPU. I realize we don't have any application monitoring or trending other than the usual CPU and memory of the VM itself.
I'm not quite sure where to start, but I guess my first thing would be to figure out how to gather some kind of trending data for RIAK itself.
What is a good tool/script these days to monitor cluster health?
Otherwise, do I investigate if I have some kind of problem or is high CPU on a busy cluster normal?
Beta Was this translation helpful? Give feedback.
All reactions