-
Notifications
You must be signed in to change notification settings - Fork 70
Histplot errorbars #429
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
For sure the mplhep/src/mplhep/error_estimation.py Lines 11 to 31 in 9b47e92
sqrt(N) for reasonably large N. Nevertheless, one may prefer to be using exactly sqrt(sumw2) the documentation and the code should be synchronized, and I'm sure the maintainers would be happy to receive a PR to correct this.
In the meantime, you can pass a custom error function to def mymethod(sumw, sumw2):
return np.array([sumw - np.sqrt(sumw2), sumw + np.sqrt(sumw2)]) |
I came across this issue while trying to do another task with the histplot errorbars. Basically, my goal is to not draw the error bars for the bins without any content. Usually, this isn't a huge issue since the bins with content dwarf the bins without content and therefore the error bars on the bins without content do not rise above the x-axis into the figure; however, in the case where I am plotting on a log scale and the bin content traverses many orders of magnitude, the bins without content do show up in the plot. This problem can be replicated with a short h = hist.Hist.new.Reg(10,0,1).Weight()
h.fill(
[0., 0.5, 0.6, 0.7, 0.8, 0.9],
weight = [3., 1e3, 1e4, 1e5, 1e6, 1e7]
) Now with h.plot()
plt.yscale('log') We can mask out the error bars by setting the bins with a def poisson_interval_ignore_empty(sumw, sumw2):
interval = mplhep.error_estimation.poisson_interval(sumw, sumw2)
lo, hi = interval[0,...], interval[1,...]
to_ignore = np.isnan(lo)
lo[to_ignore] = 0.0
hi[to_ignore] = 0.0
return np.array([lo,hi])
mplhep.histplot(h.values(), bins=h.axes[0].edges, w2method = poisson_interval_ignore_empty, w2 = h.variances())
plt.yscale('log') The process of developing this solution actually showed me that the current function does not actually set nan to zero. You should use |
It might make sense as an additional method, but it's maybe not trivial if there is 0 uncertainty for 0 events yield in poison stats cf. https://stats.stackexchange.com/questions/427019/confidence-interval-for-mean-of-poisson-with-only-zero-counts |
We also stumbled upon this issue in our plots after some debugging of some weird looking y errors. Is there any deeper reason why the fix would just involved changing the Lines 477 to 478 in 7fd363d
? (and changing I guess for applications like plotting data, |
Thanks for reminding me of this @riga I think it got forgotten while hoping for a PR from the OP. The fix should be fine, the existence of the method is basically historical, as a fallback for "simple" histograms like just passing I am curious, can you share your use-case? Because I would assume that the overwhelming majority of "I just want it to work" use-cases want |
It's actually a more general use case and maybe even rather related to We are using I would have assumed that, if a weight storage is present, variances and the w2method are set for the Right now, Some background on where we saw an issue: we tried plotting a simple histogram with bin values around 0.1 and variances around 0.0001 and then found where two of the last three bins show exceedingly high y-errors. For some reason (maybe because the bin values are almost - but not exactly - zero there), those bins (the others maybe as well?) fallback to the (asym.) poisson treatment with an uncertainty of e.g. ~1.8 on a value of 0, which does not match the variance values in these bins. |
Gotcha thanks for the clarification. This helps. So the easy issue I see are:
The less obvious to me are:
Btw I haven't checked, but I assume |
Unfortunately, one of the reasons this keeps coming up is because Maybe I'm using it wrong? Python Tests Showing w2method Not Being Calledimport hist
import mplhep
import mplhep.error_estimation
import matplotlib.pyplot as plt
import unittest
class w2method_called:
def __init__(self):
self.called = False
def __call__(self, w, w2):
self.called = True
return mplhep.error_estimation.poisson_interval(w, w2)
def get_weighted_hist():
return (
hist.Hist.new
.Reg(10,0,1)
.Weight()
).fill(
[0., 0.5, 0.6, 0.7, 0.8, 0.9],
weight = [3., 1e3, 1e4, 1e5, 1e6, 1e7]
)
class TestW2Method(unittest.TestCase):
def check_plot_method(self, f):
w2method = w2method_called()
f(get_weighted_hist(), w2method)
self.assertTrue(w2method.called)
plt.clf()
def test_hist_Hist_plot(self):
self.check_plot_method(
lambda h, w2method: h.plot(w2method = w2method)
)
def test_histplot_Hist(self):
self.check_plot_method(
lambda h, w2method: mplhep.histplot(h, w2method = w2method)
)
def test_histplot_vals_bins_variances(self):
self.check_plot_method(
lambda h, w2method: mplhep.histplot(
h.values(), bins = h.axes[0].edges, w2 = h.variances(),
w2method = w2method
)
)
if __name__ == '__main__':
unittest.main() Output:
Environment:
|
@riga is what you see perhaps coming from the "scaled poisson" Garwood interval?
This is coming from the assumption that if a bin is zero then its corresponding sumw2 will also be zero. Perhaps that is too strong of an assumption? |
Ok, so
and
should be fixed here #558 In the meantime I will leave this issue open in case we want to continue the discussion about sensible defaults for examples such as the one brough by @riga. Here's a short summary of the current behaviour Code
|
Sorry for the delay. Looking again at what happens when a bin is zero, I think so, yes (to both). I guess with #558 this should be better handled now, and it's indeed just a matter of what is a sensible default. |
I find the documentation of the histplot function [0] slightly confusing/contradictive. I stumbled upon this when I wanted to plot a simple weighted 1D histogram with sqrt(w2) errorbars (which I assumed was the default, but apparently not). I have not checked if the following issues are true for histtypes other than
histtype=‘errorbar’
.This line about the yerr parameter "Following modes are supported: - True, sqrt(N) errors or poissonian interval when
w2
is specified" [1] makes it sound like ifyerr == True
, thensqrt(N)
is used for the errors ifw2
is NOT specified, and that the poissonian interval is used ifw2
IS specified. However, this is not the case: it always does the poissonian interval even ifw2
isNone
because in thePlottable
class definition it always initialises with the method “poisson” [2], which therefore, will always run the poissonian interval [3]. Ifw2
is specified, andyerr!=None
, then it crashes because of [4].This leads me to another confusion: which error calculation do we want for weighted histograms? The following line about the w2 parameter "Sum of the histogram weights squared for poissonian interval error calculation" [5] makes it sound like we always want to use the poissonian error for weighted histograms, while this line "If w2 has integer values (likely to be data) poisson interval is calculated, otherwise the resulting error is symmetric
sqrt(w2)
" [6] makes it sound like the poisson interval should only be used for integer values. Again, "sqrt" is never used ifw2method
isNone
because of [3], even ifw2
has integer values.Also, if you specify to use the “sqrt” method, it never uses the variances (i.e.
w2
) for calculating the errors [7]...I'm happy submit a PR if someone can explain what the intended usage is :-)
The text was updated successfully, but these errors were encountered: