You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Today, NestedFrame.query checks the expression it's given and raises ValueError if it mixes nested and base columns. This is because in order to handle such expressions correctly, it would need to tease out the sub-expressions that are strictly against the nested columns (by traversing the abstract syntax tree of the input expression), apply and re-pack into an intermediate result, and then apply the base column expressions to this intermediate result.
In an expression like a > 2 & nested.flux > 50, for example, the user would expect the resulting NestedFrame to have no a values which were <= 2 and no nested.flux values which were <= 50. And in an expression like a > 2 | nested.flux > 50, the user would still expect to retain rows where a <= 2 so long as it had some nested.flux > 50, but within those rows, they wouldn't expect to see any nested.flux <= 50. For those rows where a > 2, though, they'd expect to see all the nested.flux rows. In other words, as soon as there is mixed-level expression, the nested rows sometimes need to be queried and repacked before continuing, or at least that should be the final effect.
Logically, if there was a method to unpack all nests and broadcast all base columns across them, then we would take the result of self.eval(expr) and do something like self.flatten_all().loc[result].repack_all(), but this would likely not be performant.
The text was updated successfully, but these errors were encountered:
I’m against the broadcasting approach: it may cause memory usage to explode, while one of the core ideas of nested-pandas is to never have the “joined” version of the base and nested columns. We either need to find another way to do it or not implement this feature.
Today,
NestedFrame.query
checks the expression it's given and raisesValueError
if it mixes nested and base columns. This is because in order to handle such expressions correctly, it would need to tease out the sub-expressions that are strictly against the nested columns (by traversing the abstract syntax tree of the input expression), apply and re-pack into an intermediate result, and then apply the base column expressions to this intermediate result.In an expression like
a > 2 & nested.flux > 50
, for example, the user would expect the resultingNestedFrame
to have noa
values which were<= 2
and nonested.flux
values which were<= 50
. And in an expression likea > 2 | nested.flux > 50
, the user would still expect to retain rows wherea <= 2
so long as it had somenested.flux > 50
, but within those rows, they wouldn't expect to see anynested.flux <= 50
. For those rows wherea > 2
, though, they'd expect to see all thenested.flux
rows. In other words, as soon as there is mixed-level expression, the nested rows sometimes need to be queried and repacked before continuing, or at least that should be the final effect.Logically, if there was a method to unpack all nests and broadcast all base columns across them, then we would take the result of
self.eval(expr)
and do something likeself.flatten_all().loc[result].repack_all()
, but this would likely not be performant.The text was updated successfully, but these errors were encountered: