Skip to content

add user-defined handling of NaN and inf to RDataFrame statistics #19879

@ynikitenko

Description

@ynikitenko

Feature description

I analysed a tree in PyROOT with RDataFrame. Statistics for most variables could be calculated correctly. However, for one column I got nan-s. After further inspection it turned out that one event from 8 millions contained a NaN in that column. I tried Stats and Mean, and they both returned NaN. NaN is the correct answer for that data, but it is not practical.

The possible solutions could be:

  1. RDataFrame could optionally filter irregular numbers (like NaN or inf).
  2. Statistics functions of RDataFrame optionally support that. Tree Drawer simply ignored that value and calculated the mean (already noted here). One could provide an option to Stats() or Mean() to ignore NaN-s.
  3. Documentation is added for user to handle that. There is a similar section Working with missing values in the dataset in the RDataFrame reference. NaNs are common in statistical analysis, so they would be worth explanation too.

The approach with RDataFrame looks cleaner; at the same time, another method like "FilterNaNs" for RDataFrame looks like feature creep. In Python that could be solved with a keyword argument (option); for C++ it is less clear to me (5 overloaded functions neither look good). An improved guideline is always possible.

Alternatives considered

I could solve it in PyROOT with

ROOT.gInterpreter.Declare("auto myisnan = static_cast<bool (*)(float)>(&isnan);") dr_cmx_no_cut = df.Filter("!myisnan(displacement.dr_proj_cmx_cm)")\ .Stats("displacement.dr_proj_cmx_cm")

However, that is a) non-evident, especially for Python users, b) filtering the same column with a dedicated function looks redundant (maybe there is a better syntax for that).

Additional context

Not sure whether it is connected, but there was a similar topic on the ROOT Forum.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions