Skip to content

Weighted variance computation for sparse and dense arrays #6205

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Nov 18, 2022

Conversation

markotoplak
Copy link
Member

@markotoplak markotoplak commented Nov 16, 2022

Issue

Implements weighted variance (and mean) computation that works with sparse, dense. Also handles NaNs.

Needed in #6202.

Merge after #6204.

Includes
  • Code changes
  • Tests
  • Documentation

@markotoplak markotoplak added the dask Related (discovered in or needed) to the Dask adaptation label Nov 16, 2022
@codecov
Copy link

codecov bot commented Nov 16, 2022

Codecov Report

Merging #6205 (5a3ced8) into master (af8124a) will increase coverage by 0.00%.
The diff coverage is 100.00%.

❗ Current head 5a3ced8 differs from pull request most recent head f0ee0d6. Consider uploading reports for the commit f0ee0d6 to get more accurate results

Additional details and impacted files
@@           Coverage Diff           @@
##           master    #6205   +/-   ##
=======================================
  Coverage   86.65%   86.66%           
=======================================
  Files         316      316           
  Lines       67995    68021   +26     
=======================================
+ Hits        58924    58950   +26     
  Misses       9071     9071           

@markotoplak markotoplak mentioned this pull request Nov 16, 2022
3 tasks
@markotoplak markotoplak force-pushed the stats-nan-mean-variance branch 2 times, most recently from e133a23 to 1cadd90 Compare November 16, 2022 21:48
@markotoplak markotoplak changed the title Weighted variance computation for sparse and dense arrays [NOMERGE] Weighted variance computation for sparse and dense arrays Nov 17, 2022
@markotoplak markotoplak marked this pull request as ready for review November 17, 2022 13:55
@@ -476,6 +476,45 @@ def nanmean(x, axis=None, weights=None):
return means


def nan_mean_variance_axis(x, axis=None, weights=None):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any particular reason why we wouldn't support axis=None in this case? I know sklearn's implementation doesn't support it, but why wouldn't we? Then we could call this function nan_mean_variance, and it would behave more similarly to our other functions here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, no particular reason, it just additional work and should also work differently. With weights=None the input weights should be a matrix. Otherwise I do not think the problem is clearly defined. On a related note, I do not think nanmean works properly with weights when axis is None.

Therefore I avoided it in this PR. What do you think?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

with weights=None the input weights should be a matrix. [...] I do not think nanmean works properly with weights when axis is None.

I think you're right. It looks like the weights are just ignored. I would still prefer this function to be nan_mean_variance, so that if anyone, at some points, decides they need this computation with axis=None, we won't have to go through a renaming cycle, or end up with a second function with the practically same name. I'd be totally fine if this function raises a NotImplementedError if axis is None.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in the last push.

@markotoplak markotoplak force-pushed the stats-nan-mean-variance branch from 1cadd90 to 5a3ced8 Compare November 18, 2022 11:18
@markotoplak markotoplak changed the title [NOMERGE] Weighted variance computation for sparse and dense arrays Weighted variance computation for sparse and dense arrays Nov 18, 2022
Implement computation of means and variance for dense or sparse arrays.
Both can include NaNs. The computation is weighted and produces
the same results as we get from Distributions.
@markotoplak markotoplak force-pushed the stats-nan-mean-variance branch from 5a3ced8 to f0ee0d6 Compare November 18, 2022 13:16
@markotoplak markotoplak merged commit 7d18600 into biolab:master Nov 18, 2022
@markotoplak markotoplak deleted the stats-nan-mean-variance branch November 21, 2022 12:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dask Related (discovered in or needed) to the Dask adaptation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants