Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add functions that allow to efficiently merge and downsample expo histograms #12328

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

felixbarny
Copy link

@felixbarny felixbarny commented Feb 10, 2025

While testing the performance for merging exponential histograms in lsmintervalprocessor (which we're planning to contribute) that borrows the histogram merging logic from the deltatocumulative processor, we've found the process of merging histogram buckets to perform very poorly.

The reason is that there's no low-level access to the underlying UInt64Slice, so read and especially write operations need to go through an abstraction that check whether the slice is in a mutable state.

screenshot_2025-02-07_at_1 36 38___pm

Instead of allowing direct access to the slice, which would not be safe, this PR proposes to add two methods to numeric slices that allow safe and efficient operations needed for merging and downsampling histograms.

  • Summing of two slices to help with merging histogram buckets
  • Collapsing a slice by combining n elements to help with downsampling i.e. changing the scale of a histogram

@felixbarny felixbarny requested review from bogdandrutu, dmitryax and a team as code owners February 10, 2025 08:28
Copy link
Member

@bogdandrutu bogdandrutu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a very custom solution for maybe not so common problem:

  • I don’t see a benchmark, the pprof doesn’t tell me how much improvement is this.
  • Have you tried the to implement the new Seq iterator available in 1.23 golang? We will be able to implement that in couple of weeks.
  • What other alternatives have you considered?

@felixbarny
Copy link
Author

Thanks for the quick feedback! I definitely acknowledge that these functions are rather specific to the use case for merging histograms. Having said that, merging histograms sounds like a common enough use case that pdata should allow for doing efficiently. I didn't know about the seq iterator in go 1.23. I'll have a look and see if that's a viable alternative. From a first glance, it didn't seem obvious how to implement something like Collapse with it but I'll have a closer look and provide benchmarks for the different alternatives.

Other solutions considered are converting exponential histograms into a custom data structure or go-expohisto that allow for a more efficient merging of histograms. But that requires another conversion step and allocations. This conversion could also benefit from a more efficient read access to the slice. Adding a range function should be generically applicable and less controversial.

Another approach could be offering canonical implementation for merging histograms. It may use a similar method to the one proposed, but it wouldn't need changes in the primitive slices themselves. That would make sense if and when we see multiple processors re-implementing the same thing. With deltatocumulative and lsminterval there are at least two processors that are doing the same thing.

Copy link
Contributor

@jmacd jmacd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you restrict the generation of these methods only to the exponential histogram buckets structs?

It strikes me that this the n in Collapse(n) is more general than ought to be needed for exponential histogram users. The n is always a power of two (or else it's being abused).

Copy link
Member

@bogdandrutu bogdandrutu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am strongly encourage that, before we have an agreement on this being a path to not waste too much resources in implementing this in the "right" way. We can have these funcs manually written initially (not generated) do the benchamarks etc. then we do a decision then we implement correctly.

Otherwise this will take a while and you will feel that we don't make "progress".

@felixbarny
Copy link
Author

Thanks for the feedback and guidance. I’ll follow up next week when I’m back from an offsite.

@felixbarny
Copy link
Author

felixbarny commented Feb 12, 2025

I found some time in between the sessions to implement two alternative implementations (455d680) and create a benchmark

                                                                    │   sec/op    │
UInt64SliceTryIncrementFrom/TryIncrementFromDirectAccess-10           33.46n ± 1%
UInt64SliceTryIncrementFrom/tryIncrementFromWithCurrentFunctions-10   66.19n ± 0%
UInt64SliceTryIncrementFrom/tryIncrementFromTransform-10              182.2n ± 1%
  • TryIncrementFromDirectAccess - the solution originally proposed in this PR. This relies on adding new specific functions to UInt64Slice
  • tryIncrementFromWithCurrentFunctions - this is using the existing functions like UInt64Slice.At and UInt64Slice.SetAt. This doubles the execution time. I part of the reason is that SetAt needs to check whether the slice is mutable for every element. There's also more overhead for calling the different functions.
  • tryIncrementFromTransform - adds a generic UInt64Slice.Transform function that takes a callback function as an argument which gets called for all elements of the slice and allows to modify the value. This avoids having to call AssertMutable for every element. However, it ends up performing the worst, I suppose because of the overhead coming from calling the callback function for each element.

My takeaway is that if we want an implementation that's as efficient as possible, we need direct access to the slice when modifying it, without going though abstraction layers. I can't think of another way to do it rather than adding functions to UInt64Slice that are rather specific for merging histograms.

@github-actions github-actions bot requested a review from bogdandrutu February 12, 2025 14:26
@felixbarny
Copy link
Author

It strikes me that this the n in Collapse(n) is more general than ought to be needed for exponential histogram users. The n is always a power of two (or else it's being abused).

Yeah, true. But enforcing that n is a power of 2 doesn’t seem to make the function easier but even more specific to the downsampling use case.

@bogdandrutu
Copy link
Member

Somebody else implemented the Seq thing. See #12380 maybe you can add that as well (requires 1.23)

@bogdandrutu
Copy link
Member

Also, try the "brute" force to create a new slice and use FromRaw to install it, I know is an extra slice, but may use some sync.Pool on your side since we force a copy anyway.

@felixbarny
Copy link
Author

I'm not sure how you're envisioning the Seq iterator based implementation. I gave it a try by implementing a IncrementFromSeq function but unsure if that's what you meant. The two new functions perform the worst so far.

                                                                    │   sec/op    │
UInt64SliceTryIncrementFrom/TryIncrementFromDirectAccess-10           33.46n ± 0%
UInt64SliceTryIncrementFrom/tryIncrementFromWithCurrentFunctions-10   69.79n ± 0%
UInt64SliceTryIncrementFrom/tryIncrementFromTransform-10              182.7n ± 0%
UInt64SliceTryIncrementFrom/tryIncrementFromNewSlice-10               290.9n ± 3%
UInt64SliceTryIncrementFrom/incrementFromSeq-10                       326.5n ± 0%
geomean                                                               132.3n

                                                                    │      B/op      │
UInt64SliceTryIncrementFrom/TryIncrementFromDirectAccess-10             0.000 ± 0%
UInt64SliceTryIncrementFrom/tryIncrementFromWithCurrentFunctions-10     0.000 ± 0%
UInt64SliceTryIncrementFrom/tryIncrementFromTransform-10                0.000 ± 0%
UInt64SliceTryIncrementFrom/tryIncrementFromNewSlice-10               1.250Ki ± 0%
UInt64SliceTryIncrementFrom/incrementFromSeq-10                         80.00 ± 0%

@jmacd
Copy link
Contributor

jmacd commented Feb 21, 2025

@felixbarny @bogdandrutu

I wonder if we should step back and ask whether there is some way to get access to the underlying slices, for example if the Pdata packages had mutator methods that check for the capability and return a pointer to the slice. This would introduce a certain potential for unsafe behavior for the potential gain that you're after.

@sfc-gh-bdrutu
Copy link

This would introduce a certain potential for unsafe behavior for the potential gain that you're after.

I would also try to understand if this is significant enough to justify the unsafe API.

@felixbarny
Copy link
Author

Updates from some of the discussions we had internally:

  • We could add a method to go-expohisto that allows to efficiently convert a pmetric histogram into a structure.Histogram which allows for efficiently merging histograms. Downsides are that this still requires a conversion step and that go-expohisto would need to have a dependency on pdata. See also Add an method to convert from OpenTelemetry's pdata to expohisto lightstep/go-expohisto#4.
  • Another thing that just came to mind would be to essentially bring the functionality of go-expohisto directly into pdata. This would allow us to avoid a lot of conversion roundtrips. In our use case, we're creating histograms using go-expohisto in the signaltometricsconnector, which need to be converted to pdata in order to send it down the collector pipeline. Just to convert it back to a go-expohisto in lsmintervalprocessor. If pdata wasn't just a pure data transfer object but actually allows to track values and perform merge operations, a lot of overhead could be avoided. But I'm not sure how much appetite there is to bring logic into pdata, though. I'm coming from an object-oriented background, so it seems to make a lot of sense to me.
  • We've also discussed building a histogram merging logic on top of otel-arrow. This would play very nicely with the lsmintervalprocessor as we could avoid a lot of encode/decode overhead due to arrow's zero-copy architecture. @lahsivjar wanted to look into whether that's a viable approach.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants