Weighted variance computation for sparse and dense arrays #6205

markotoplak · 2022-11-16T20:42:58Z

Issue

Implements weighted variance (and mean) computation that works with sparse, dense. Also handles NaNs.

Needed in #6202.

Merge after #6204.

Includes

Code changes
Tests
Documentation

codecov · 2022-11-16T20:59:09Z

Codecov Report

Merging #6205 (5a3ced8) into master (af8124a) will increase coverage by 0.00%.
The diff coverage is 100.00%.

❗ Current head 5a3ced8 differs from pull request most recent head f0ee0d6. Consider uploading reports for the commit f0ee0d6 to get more accurate results

Additional details and impacted files

@@           Coverage Diff           @@
##           master    #6205   +/-   ##
=======================================
  Coverage   86.65%   86.66%           
=======================================
  Files         316      316           
  Lines       67995    68021   +26     
=======================================
+ Hits        58924    58950   +26     
  Misses       9071     9071

pavlin-policar · 2022-11-18T11:05:57Z

Orange/statistics/util.py

@@ -476,6 +476,45 @@ def nanmean(x, axis=None, weights=None):
    return means


+def nan_mean_variance_axis(x, axis=None, weights=None):


Is there any particular reason why we wouldn't support axis=None in this case? I know sklearn's implementation doesn't support it, but why wouldn't we? Then we could call this function nan_mean_variance, and it would behave more similarly to our other functions here.

No, no particular reason, it just additional work and should also work differently. With weights=None the input weights should be a matrix. Otherwise I do not think the problem is clearly defined. On a related note, I do not think nanmean works properly with weights when axis is None.

Therefore I avoided it in this PR. What do you think?

with weights=None the input weights should be a matrix. [...] I do not think nanmean works properly with weights when axis is None.

I think you're right. It looks like the weights are just ignored. I would still prefer this function to be nan_mean_variance, so that if anyone, at some points, decides they need this computation with axis=None, we won't have to go through a renaming cycle, or end up with a second function with the practically same name. I'd be totally fine if this function raises a NotImplementedError if axis is None.

Done in the last push.

Orange/statistics/util.py

Implement computation of means and variance for dense or sparse arrays. Both can include NaNs. The computation is weighted and produces the same results as we get from Distributions.

markotoplak added the dask Related (discovered in or needed) to the Dask adaptation label Nov 16, 2022

markotoplak mentioned this pull request Nov 16, 2022

[ENH] Faster normalization #6202

Merged

3 tasks

markotoplak force-pushed the stats-nan-mean-variance branch 2 times, most recently from e133a23 to 1cadd90 Compare November 16, 2022 21:48

markotoplak changed the title ~~Weighted variance computation for sparse and dense arrays~~ [NOMERGE] Weighted variance computation for sparse and dense arrays Nov 17, 2022

markotoplak marked this pull request as ready for review November 17, 2022 13:55

janezd assigned pavlin-policar Nov 18, 2022

pavlin-policar reviewed Nov 18, 2022

View reviewed changes

markotoplak force-pushed the stats-nan-mean-variance branch from 1cadd90 to 5a3ced8 Compare November 18, 2022 11:18

markotoplak changed the title ~~[NOMERGE] Weighted variance computation for sparse and dense arrays~~ Weighted variance computation for sparse and dense arrays Nov 18, 2022

Implements statistics.util.nan_mean_variance_axis

f0ee0d6

Implement computation of means and variance for dense or sparse arrays. Both can include NaNs. The computation is weighted and produces the same results as we get from Distributions.

markotoplak force-pushed the stats-nan-mean-variance branch from 5a3ced8 to f0ee0d6 Compare November 18, 2022 13:16

markotoplak merged commit 7d18600 into biolab:master Nov 18, 2022

markotoplak deleted the stats-nan-mean-variance branch November 21, 2022 12:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Weighted variance computation for sparse and dense arrays #6205

Weighted variance computation for sparse and dense arrays #6205

markotoplak commented Nov 16, 2022 •

edited

Loading

codecov bot commented Nov 16, 2022 •

edited

Loading

pavlin-policar Nov 18, 2022

markotoplak Nov 18, 2022

pavlin-policar Nov 18, 2022

markotoplak Nov 18, 2022

		@@ -476,6 +476,45 @@ def nanmean(x, axis=None, weights=None):
		return means


		def nan_mean_variance_axis(x, axis=None, weights=None):

Weighted variance computation for sparse and dense arrays #6205

Weighted variance computation for sparse and dense arrays #6205

Conversation

markotoplak commented Nov 16, 2022 • edited Loading

Issue

Includes

codecov bot commented Nov 16, 2022 • edited Loading

Codecov Report

pavlin-policar Nov 18, 2022

Choose a reason for hiding this comment

markotoplak Nov 18, 2022

Choose a reason for hiding this comment

pavlin-policar Nov 18, 2022

Choose a reason for hiding this comment

markotoplak Nov 18, 2022

Choose a reason for hiding this comment

markotoplak commented Nov 16, 2022 •

edited

Loading

codecov bot commented Nov 16, 2022 •

edited

Loading