Replies: 1 comment 1 reply
-
Hi Jackson, Thanks for the post. I agree this is a potentially useful feature, and one that doesn't currently exist. The approach you're currently using is probably the best way at the moment, or you can provide raw sql At the moment, we're actually in the middle of a major refactor of comparison levels that will make it much easier to write new ones. This work will slightly break backwards compatibility however, and so will only become available in Splink 4 (timings TBC but we expect to have a very early alpha release before xmas). The rationale is that it's currently too hard to add new ones - you have to touch a lot of different files to make them work across the various backends. So I'd probably say wait until that work is complete. In future it should be relatively simple (example here) to add. You could have a go now if you like, but the tests aren't quite working yet, so it's probably best to wait. Hope that makes sense and feel free to follow up with any more questions. Robin |
Beta Was this translation helpful? Give feedback.
-
Hello,
I'm wondering if there's currenlty a method to determine if a string column is a member of an array of strings? I wasn't able to find this anywhere in the repo.
In my case for a de-duplication task, I supplemeneted my table with an array column of potential nicknames (nickname_array) based on the name column for the respective row. However, when running the de-duplication, there's no ability to compare the name column of one row with the nickname_array column of another row. I've temporarily resorted to creating a name_array column of length 1 and then checking for an intersection > 0 with
size_array_intersect_sql
. I'm including this as one "level" in a broader name feature.If this makes sense and sounds reasonable (and isn't already a feature that's available), I'm happy to give it a try and open a pull request.
Thanks in advance,
Jackson
Beta Was this translation helpful? Give feedback.
All reactions