-
Notifications
You must be signed in to change notification settings - Fork 374
Description
I have seen the discussion in #3363, and it was mentioned that adding arbitrary matching functions is part of 1.7 milestone. I would like to ask, what is the status of this feature request, are there any plans to implement it soon?
In principle, I could try to cook up a PR on my own, but I am not that familiar with DataFrames internals. Also the details of the syntax should be discussed in advance.
So, what is the principial difficulty with adding this feature? The way I see it, we just need to overload the equality comparison, which is used to relate the rows from the two DataFrames. Everything else, including the validation can stay without changes.
Now, there is a separate matter of the transformation of the columns, that was requested in #3363. I think, it makes sense to distinguish transformation and matching.
We should really only match the columns of the same type between two dataframes, because we also need to match different rows inside the same dataframe to do the validation.
And transformation should only provide the means to bring the columns to the same type: we first transform columns in one or both dataframes to the same type, and then run matching on the results of the transforms.
Transformaton can be already done by the users themselves simply by adding extra column. On the other hand, it is the arbitrary matching that requires tinkering with internals, but treating it separately from transformation simplifies the realization.
What do you think about it?