Skip to content

Conversation

@edish-github
Copy link
Contributor

This Pull Request:

Adds .Skewness() and .Kurtosis() actions to RDataFrame.

Motivation

In high-energy physics, looking beyond the mean and standard deviation is usually critical.

  • Skewness helps in detecting asymmetry in energy distributions (which potentially indicates parity violation).
  • Kurtosis helps identify "heavy tails" where rare events (like new heavy particles or dark matter candidates) might hide.

But currently it requires users to either bin data into TH1 (losing precision) or write manual loops. This action allows for exact calculation in a single pass.

Implementation Details

I used Welford's Online Algorithm, which allows us to calculate mean, variance, skewness, and kurtosis simultaneously in one pass. This is numerically stable and fits the RDataFrame parallel map-reduce pattern.

  • Implementation Note: I initially explored an explicit SIMD implementation using ROOT::RVec.However, benchmarks on my local machine (M3, Clang -O3) showed that the compiler auto-vectorizes the scalar Welford loop very effectively, so i went for the scalar implementation

Changes or fixes:

  • Added SkewnessHelper and KurtosisHelper in tree/dataframe/inc/ROOT/RDF/ActionHelpers.hxx.
  • Exposed .Skewness() and .Kurtosis() in tree/dataframe/inc/ROOT/RDF/RInterface.hxx.
  • Registered the new action tags in tree/dataframe/inc/ROOT/RDF/InterfaceUtils.hxx.

Checklist:

  • tested changes locally (Verified against a reference Welford implementation via test macro)
  • updated the docs (Added Doxygen comments matching the StdDev style)

@edish-github edish-github changed the title Add Skewness and Kurtosis actions to RDataFrame using Welford's algor… [RDF] Add Skewness and Kurtosis actions to RDataFrame using Welford's algorithm Nov 25, 2025
@hahnjo
Copy link
Member

hahnjo commented Nov 25, 2025

Thanks for the pointers to the numerically stable algorithm, I think I will borrow this for the global statistics of ROOT's new histograms 😉

@edish-github
Copy link
Contributor Author

@hahnjo Thanks, It’s really cool that the algorithm might be useful for the new histograms too. 🚀

I’m currently working to implement the weighted version (West's algorithm) to support weighted skewness/kurtosis in RDF. The parallel merge steps get complex for cubic terms, but I verified the prototype against calculations, it holds up perfectly.

If you end up needing the weighted merge derivations (adapted from West/Chan) for the histogram work, please let me know—happy to share my prototype if it saves you some digging! 😊

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants