Description
Is your feature request related to a problem?
When you create a batch generator, what happens when you have data with NaNs? For example, if we consider an ocean data set, like a map of sea surface temperature, you may iterate through different regions where the stencil is valid, partially valid, or completely full of NaNs. The fact that xbatcher can't filter for these situations means that if you need this, you will have to apply filters inside the batch loop, meaning that you will end up with load imbalances.
Describe the solution you'd like
I would like to see an option in BatchGenerator for a selection predicate. Basically, you would pass a function to BatchGenerator that takes slices as inputs, and evaluates to either True
or False
. BatchGenerator would then use the result to select only the slices that returned True
, thereby restoring load balance.
Describe alternatives you've considered
No response
Additional context
I think this is similar to #158