-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Weighted variance computation for sparse and dense arrays #6205
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Weighted variance computation for sparse and dense arrays #6205
Conversation
Codecov Report
Additional details and impacted files@@ Coverage Diff @@
## master #6205 +/- ##
=======================================
Coverage 86.65% 86.66%
=======================================
Files 316 316
Lines 67995 68021 +26
=======================================
+ Hits 58924 58950 +26
Misses 9071 9071 |
e133a23
to
1cadd90
Compare
Orange/statistics/util.py
Outdated
@@ -476,6 +476,45 @@ def nanmean(x, axis=None, weights=None): | |||
return means | |||
|
|||
|
|||
def nan_mean_variance_axis(x, axis=None, weights=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any particular reason why we wouldn't support axis=None
in this case? I know sklearn's implementation doesn't support it, but why wouldn't we? Then we could call this function nan_mean_variance
, and it would behave more similarly to our other functions here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, no particular reason, it just additional work and should also work differently. With weights=None
the input weights should be a matrix. Otherwise I do not think the problem is clearly defined. On a related note, I do not think nanmean
works properly with weights when axis is None.
Therefore I avoided it in this PR. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
with weights=None the input weights should be a matrix. [...] I do not think nanmean works properly with weights when axis is None.
I think you're right. It looks like the weights are just ignored. I would still prefer this function to be nan_mean_variance
, so that if anyone, at some points, decides they need this computation with axis=None
, we won't have to go through a renaming cycle, or end up with a second function with the practically same name. I'd be totally fine if this function raises a NotImplementedError
if axis is None
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in the last push.
1cadd90
to
5a3ced8
Compare
Implement computation of means and variance for dense or sparse arrays. Both can include NaNs. The computation is weighted and produces the same results as we get from Distributions.
5a3ced8
to
f0ee0d6
Compare
Issue
Implements weighted variance (and mean) computation that works with sparse, dense. Also handles NaNs.
Needed in #6202.
Merge after #6204.
Includes