Skip to content

Column selectors should guarantee column order is preserved #20

Open
@CameronBieganek

Description

@CameronBieganek

I think that all column selectors (other than arrays) should guarantee that the column order in the original table is preserved. One would certainly expect that to be the case for Between, though it's not explicitly mentioned in the docstring. It would be a bummer if you had foo(x, y) = 2x .+ y but Between(:x, :y) => foo happened to lower to [:y, :x] => foo instead of [:x, :y] => foo.

And I think it makes sense to guarantee column order preservation for the other selectors. E.g.

df = DataFrame(a=1, b=2, c=3)
select(df, Not(:b) => foo)

should be guaranteed to lower to

select(df, [:a, :c] => foo)

rather than

select(df, [:c, :a] => foo)

I'm not totally certain the best way to specify the column ordering properties of Cols, but I think this specification makes sense:

  • Individual column selectors inside Cols are first lowered to (ordered) arrays.
    • The lowering of the individual column selectors (except for arrays) follows the rule above that table column order should be preserved.
  • Cols is then lowered as follows: Cols(A, B, C) ==> [A, B\A, C\(A ∪ B)] (where the arguments on the right side are splatted into the array).

Since setdiff on arrays preserves the order of the first argument to setdiff, we get the following behavior:

df = DataFrame(a=1, b=2, c=3)
Cols([:c, :b], [:a, :b]) == [:c, :b, :a]
Cols(r"[bc]", r"[ab]") == [:b, :c, :a]

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions