Open
Description
In the new version of fuzzyjoin, joining data.tables makes them stop being data.tables.
Just updated to R 4.* and therefore alot of packages updated as well. In these new versions - I made a complete uninstall of my OS so don't know which versions it was - , joining data.tables with fuzzyjoin was suddenly a problem if later code relied on the data.table syntax.
reprex:
library(data.table)
library(fuzzyjoin)
a1 <- data.table(name=c('suzy', 'suxy', 'John', 'Janni', 'Tom'))
b1 <- data.table(name=c('suzzy', 'johnn', 'Jannice', 'Tom'))
c1 <- stringdist_inner_join(a1, b1, by = 'name', method='lv', max_dist=1, ignore_case=T, distance_col='fuzzy_dist')
is.data.table(c1)
you can easily recreate that with:
setDT(c1)
is.data.table(c1)
So it's easy to fix, but it broke some functions for matching i had made that relied on the data.table syntax after the stringdist_inner_join() was applied.
> sessionInfo()
R version 4.0.3 (2020-10-10)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.1 LTS
Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/liblapack.so.3
locale:
[1] LC_CTYPE=en_DK.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_DK.UTF-8 LC_COLLATE=en_DK.UTF-8
[5] LC_MONETARY=en_DK.UTF-8 LC_MESSAGES=en_DK.UTF-8
[7] LC_PAPER=en_DK.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_DK.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] fuzzyjoin_0.1.6 data.table_1.13.2
loaded via a namespace (and not attached):
[1] stringdist_0.9.6.3 tidyr_1.1.2 crayon_1.3.4.9000 dplyr_1.0.2
[5] R6_2.4.1 lifecycle_0.2.0 magrittr_1.5 pillar_1.4.6
[9] stringi_1.5.3 rlang_0.4.8 vctrs_0.3.4 generics_0.0.2
[13] ellipsis_0.3.1 tools_4.0.3 stringr_1.4.0 glue_1.4.2
[17] purrr_0.3.4 parallel_4.0.3 compiler_4.0.3 pkgconfig_2.0.3
[21] tidyselect_1.1.0 tibble_3.0.4
Metadata
Metadata
Assignees
Labels
No labels