Description
I think that all column selectors (other than arrays) should guarantee that the column order in the original table is preserved. One would certainly expect that to be the case for Between
, though it's not explicitly mentioned in the docstring. It would be a bummer if you had foo(x, y) = 2x .+ y
but Between(:x, :y) => foo
happened to lower to [:y, :x] => foo
instead of [:x, :y] => foo
.
And I think it makes sense to guarantee column order preservation for the other selectors. E.g.
df = DataFrame(a=1, b=2, c=3)
select(df, Not(:b) => foo)
should be guaranteed to lower to
select(df, [:a, :c] => foo)
rather than
select(df, [:c, :a] => foo)
I'm not totally certain the best way to specify the column ordering properties of Cols
, but I think this specification makes sense:
- Individual column selectors inside
Cols
are first lowered to (ordered) arrays.- The lowering of the individual column selectors (except for arrays) follows the rule above that table column order should be preserved.
Cols
is then lowered as follows:Cols(A, B, C) ==> [A, B\A, C\(A ∪ B)]
(where the arguments on the right side are splatted into the array).
Since setdiff
on arrays preserves the order of the first argument to setdiff
, we get the following behavior:
df = DataFrame(a=1, b=2, c=3)
Cols([:c, :b], [:a, :b]) == [:c, :b, :a]
Cols(r"[bc]", r"[ab]") == [:b, :c, :a]