The wrong skew measure? #90

mcorrell · 2017-12-21T01:05:52Z

datalib currently calculates what is supposed to be Pearson's mode skewness, but does so as:
(avg - med) / std

This is non-parametric skewness, rather than:

Pearson's mode skewness which is
(avg - mode) / std

Or Pearson's median skewness, which is
3 * (avg - med) / std

So this is only giving correct values when med == mode.

Should I:

Rename the function to reflect that it's nonparametric skew.
Add a mode function, and then fix the modeskew function by calling mode.
Add a factor of 3 and rename the function to reflect that it's median skew.
Some combination of those things.
None of those things.

The text was updated successfully, but these errors were encountered:

domoritz · 2017-12-26T12:16:23Z

I'd say mode skewness should compute the correct thing so 2. But we could keep the existing function around but with a different name, which is 1. And while you're at it, why not add Pearson's median skewness (3.)?

mcorrell · 2017-12-26T13:50:38Z

mode is a little tricky, and I think the current datalib map function might be more useful in 99% of cases. For instance, mode([1,2]) should, in principle, return the unordered set {1,2}, since the distribution has two modes. But it seems silly for mode([1,1]) to return {1} instead of 1. And then there's the question of nulls and undefineds and so on. I think it's useful for somebody to know that the mode of [1,null,null] is null, but that's a judgment call. In either event, we'd have a lot of cases where we would not have a single numerical mode, and so mode skewness would be undefined.

I'm now leaning towards an option not on my list, computing skewness using the third standardized moment, rather than having to mess with a mode function. But I can submit a PR with a mode function (that would just call map and skim off the elements with the highest count) and see what people think.

jheer · 2017-12-28T18:50:35Z

I would recommend renaming the function to something more appropriate (npskew?), but leave a deprecated reference to the existing function name modeskew to support backwards compatibility. I can add appropriate warnings / caveats to the documentation. (FWIW, this method was removed from Vega in version 3 onward, though obviously persists here in datalib.)

I don't think inclusion of mode is needed, given existing functionality. A new skew method for the third standardized moment seems reasonable to add.

mcorrell · 2018-01-17T00:46:06Z

Okay, I made a PR #91 that adds sample skew. I wasn't sure how to sunset modeskew exactly so I just punted on that.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The wrong skew measure? #90

The wrong skew measure? #90

mcorrell commented Dec 21, 2017

domoritz commented Dec 26, 2017

mcorrell commented Dec 26, 2017

jheer commented Dec 28, 2017

mcorrell commented Jan 17, 2018

The wrong skew measure? #90

The wrong skew measure? #90

Comments

mcorrell commented Dec 21, 2017

domoritz commented Dec 26, 2017

mcorrell commented Dec 26, 2017

jheer commented Dec 28, 2017

mcorrell commented Jan 17, 2018