CPU metric doesn't show utilization of cores #35208
Description
Problem
We have a group of metrics cpu-stats
with metrics average_load_<min>_minute
where <min>::=one|five|fifteen
.
It shows average over the given time span and also among all the cores.
It is sometime happens that although validator is very busy doing something, only few cores are loaded and all the rest are idle.
Currently, it is impossible to detect this situation with the given metric.
Proposed Solution
So it seems that with the current metric we average in two dimensions -- cores and time.
Here, we are interested in cores. Would be nice to have a simple metrics that could allow to measure distribution of the load among cores. Maybe number of busy cores (for example, load > 85%) and number of idle cores (load < 5%), not quite sure about the best practice here. Histogram is complicated to track, would be better to have 1-2 numbers instead.