Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use binary search for histogram buckets #316

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

simpl1g
Copy link

@simpl1g simpl1g commented Nov 2, 2024

I noticed that we can use binary search as we always have sorted buckets array to improve performance

ruby 3.3.5 (2024-09-03 revision ef084cc8f4) +YJIT [arm64-darwin23]
Warming up --------------------------------------
 default_buckets old   117.753k i/100ms
default_buckets bsearch
                       226.654k i/100ms
   large_buckets old    10.103k i/100ms
large_buckets bsearch
                       130.281k i/100ms
Calculating -------------------------------------
 default_buckets old      1.396M (± 2.8%) i/s  (716.41 ns/i) -      7.065M in   5.065606s
default_buckets bsearch
                          2.414M (± 5.6%) i/s  (414.22 ns/i) -     12.239M in   5.088910s
   large_buckets old     99.015k (± 4.8%) i/s   (10.10 μs/i) -    495.047k in   5.013659s
large_buckets bsearch
                          1.486M (± 3.2%) i/s  (672.98 ns/i) -      7.426M in   5.003273s

With default buckets in gives 1.5x improvement. With buckets array with 286 elements - 15x improvement

BUCKETS = [
  0.00001, 0.000015, 0.00002, 0.000025, 0.00003, 0.000035, 0.00004, 0.000045, 0.00005, 0.000055, 0.00006, 0.000065, 0.00007, 0.000075, 0.00008, 0.000085,
  0.00009, 0.000095, 0.0001, 0.000101, 0.000102, 0.000103, 0.000104, 0.000105, 0.000106, 0.000107, 0.000108, 0.000109, 0.00011, 0.000111, 0.000112, 0.000113,
  0.000114, 0.000115, 0.000116, 0.000117, 0.000118, 0.000119, 0.00012, 0.000121, 0.000122, 0.000123, 0.000124, 0.000125, 0.000126, 0.000127, 0.000128,
  0.000129, 0.00013, 0.000131, 0.000132, 0.000133, 0.000134, 0.000135, 0.000136, 0.000137, 0.000138, 0.000139, 0.00014, 0.000141, 0.000142, 0.000143, 0.000144,
  0.000145, 0.000146, 0.000147, 0.000148, 0.000149, 0.00015, 0.000151, 0.000152, 0.000153, 0.000154, 0.000155, 0.000156, 0.000157, 0.000158, 0.000159, 0.00016,
  0.000161, 0.000162, 0.000163, 0.000164, 0.000165, 0.000166, 0.000167, 0.000168, 0.000169, 0.00017, 0.000171, 0.000172, 0.000173, 0.000174, 0.000175,
  0.000176, 0.000177, 0.000178, 0.000179, 0.00018, 0.000181, 0.000182, 0.000183, 0.000184, 0.000185, 0.000186, 0.000187, 0.000188, 0.000189, 0.00019, 0.000191,
  0.000192, 0.000193, 0.000194, 0.000195, 0.000196, 0.000197, 0.000198, 0.000199, 0.0002, 0.00021, 0.00022, 0.00023, 0.00024, 0.00025, 0.00026,
  0.00027, 0.00028, 0.00029, 0.0003, 0.00031, 0.00032, 0.00033, 0.00034, 0.00035, 0.00036, 0.00037, 0.00038, 0.00039, 0.0004, 0.00041, 0.00042,
  0.00043, 0.00044, 0.00045, 0.00046, 0.00047, 0.00048, 0.00049, 0.0005, 0.00051, 0.00052, 0.00053, 0.00054, 0.00055, 0.00056, 0.00057, 0.00058,
  0.00059, 0.0006, 0.00061, 0.00062, 0.00063, 0.00064, 0.00065, 0.00066, 0.00067, 0.00068, 0.00069, 0.0007, 0.00071, 0.00072, 0.00073, 0.00074,
  0.00075, 0.00076, 0.00077, 0.00078, 0.00079, 0.0008, 0.00081, 0.00082, 0.00083, 0.00084, 0.00085, 0.00086, 0.00087, 0.00088, 0.00089, 0.0009,
  0.00091, 0.00092, 0.00093, 0.00094, 0.00095, 0.00096, 0.00097, 0.00098, 0.00099, 0.001, 0.0015, 0.002, 0.0025, 0.003, 0.0035, 0.004, 0.0045, 0.005,
  0.0055, 0.006, 0.0065, 0.007, 0.0075, 0.008, 0.0085, 0.009, 0.0095, 0.01, 0.015, 0.02, 0.025, 0.03, 0.035, 0.04, 0.045, 0.05, 0.055, 0.06, 0.065, 0.07,
  0.075, 0.08, 0.085, 0.09, 0.095, 0.1, 0.15, 0.2, 0.25, 0.3, 0.35, 0.4, 0.45, 0.5, 0.55, 0.6, 0.65, 0.7, 0.75, 0.8, 0.85, 0.9, 0.95, 1.0, 1.5, 2.0, 2.5,
  3.0, 3.5, 4.0, 4.5, 5.0, 5.5, 6.0, 6.5, 7.0, 7.5, 8.0, 8.5, 9.0, 9.5, 10.0, 10.5, 11.0, 11.5, 12.0, 12.5, 13.0, 13.5, 14.0, 14.5, 15.0, 15.5, 16.0, 16.5, 17.0, 17.5
].freeze

default_buckets = Prometheus::Client::Histogram.new(:default_buckets, docstring: 'Default buckets')
large_buckets = Prometheus::Client::Histogram.new(
  :large_buckets,
  docstring: 'Large buckets',
  buckets: BUCKETS
)
Benchmark.ips do |x|
  x.config(time: 5, warmup: 1)

  x.report('default_buckets old') { default_buckets.observe(1) }
  x.report('default_buckets bsearch') { default_buckets.observe_bsearch(1) }

  x.report('large_buckets old') { large_buckets.observe(1) }
  x.report('large_buckets bsearch') { large_buckets.observe_bsearch(1) }

  x.compare!
end

@simpl1g simpl1g force-pushed the improve-histogram-performance branch from 3f052fd to 981f287 Compare November 2, 2024 22:55
@dmagliola
Copy link
Collaborator

Could we have the change by itself, without the reformatting of the entire file?

Signed-off-by: Konstantin Ilchenko <[email protected]>
@simpl1g simpl1g force-pushed the improve-histogram-performance branch from 981f287 to 100f46c Compare November 3, 2024 11:48
@simpl1g
Copy link
Author

simpl1g commented Nov 3, 2024

Could we have the change by itself, without the reformatting of the entire file?

@dmagliola Sorry, fixed, will also try to fix excessive allocations in separate PR

@dmagliola
Copy link
Collaborator

RE this PR: I'm planning to do a little performance experimentation locally but I'm assuming this will get merged.

RE excessive allocations... Is that related to this change? Or something else?

@simpl1g
Copy link
Author

simpl1g commented Nov 4, 2024

RE excessive allocations... Is that related to this change? Or something else?

@dmagliola this change is simple and it is not connected and will give a lot of boost for observe

my concerns that

  • a lot of places have bucket.to_s, but we can preallocate it on init
  • plain strings used like "+Inf", but frozen_string_literals is not set, so it allocates new object on each observe call
  • things like buckets + ["+Inf", "sum"] allocate new arrays (it is not in hot pass of observe, so not critical)

@dmagliola
Copy link
Collaborator

Ok, so, for this change: This seems like a sensible thing to do, my only concern was whether there could be a regression under some particular circumstance.

Doing binary search obviously works great with large numbers of buckets, and it works best the higher the observed value is (for obvious reasons). But could there be a situation, particularly with low numbers of buckets or observed numbers where it was slower?

I basically couldn't make that happen, almost. I made a benchmark script similar to yours, but it was using different numbers of buckets, and observing different numbers. The only situation in which I could make find be faster than bsearch was with large numbers of buckets, and observing literally zero. Observing any tiny number, bsearch was still faster or the same.

Given this, I think this is safe to merge. I'll give @Sinjo a change to object before I do, but I'm basically happy that this change is unequivocally good.

@dmagliola
Copy link
Collaborator

  • a lot of places have bucket.to_s, but we can preallocate it on init
  • plain strings used like "+Inf", but frozen_string_literals is not set, so it allocates new object on each observe call
  • things like buckets + ["+Inf", "sum"] allocate new arrays (it is not in hot pass of observe, so not critical)

These sounds good. I'm not 100% sure about the first one, but looking forward to your PR :)

Can you open a new one with these changes once you have them?

Some comments on these specifically:

  • a lot of places have bucket.to_s, but we can preallocate it on init

We're mostly turning the float that defines the upper bound of the bucket into a string. we can't store the strings already, though, because we need the floats to find the right bucket. But maybe i'm misunderstanding what you mean, or missing some obvious way to do this.

  • things like buckets + ["+Inf", "sum"] allocate new arrays (it is not in hot pass of observe, so not critical)

We have 2 hot paths, actually... observe is the one we have (over)optimized, but we (I, actually) neglected export, which causes problems for some of our potential users. If your change will make exporting faster (or reduce allocations), that would be extremely welcome too, not just improving observe. The key method here is Histogram#values.

@simpl1g
Copy link
Author

simpl1g commented Nov 4, 2024

But maybe i'm misunderstanding what you mean

I though about hashes, something like this

def initialize
  @h = buckets.map { |b| [b, b.to_s] }.to_h
...
def observe
  bucket = buckets.bsearch { |upper_limit| upper_limit >= value }
  str = @h[bucket] || '+Inf'

I don't see big performance improvements, but it reduces allocations
I can also look at Histogram#values to see what is happening there. It was not my priority

Why I decided to look into a code. I tried to build simple rack app to illustrate that Ruby can be fast enough compared to Rails and nodejs. There was video on youtube that compared Rails and node https://www.youtube.com/watch?v=Qp9SOOtgmS4 and obviously Rails was very slow. So I added two PR's with improvements antonputra/tutorials#330
antonputra/tutorials#335

max RPS that I was able to achieve was 105k/s. And just adding single line use Prometheus::Middleware::Collector reduced performance to 75k/s, so it is 30% penalty, quite a big. After applying bsearch patch I have around 79k/s RPS. I believe it is too much of overhead and I'm trying to understand where it came from

@dmagliola
Copy link
Collaborator

Yeah, that makes sense. Let's get this one merged as is, and let's discuss those other changes in a new PR when you have time to open it.

Thank you for the contributions!

@Sinjo
Copy link
Member

Sinjo commented Jan 5, 2025

@dmagliola Any reason not to merge this one? Happy to do that and cut a release if you're still +1 on it.

@dmagliola
Copy link
Collaborator

Yup. Just wanted to make sure you were happy with it

@Sinjo
Copy link
Member

Sinjo commented Jan 5, 2025

I'll take one more look over to refresh my memory and then hit the button.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants