Skip to content

Use == in closest join #6686

@maellecoursonnais

Description

@maellecoursonnais

closest only accepts inequality operators, but it would seem natural to have == (or some other operator) for joining with the closest value, irrespective of whether it is higher or lower.


Consider this minimal example (data taken from this SO question). This issue was also mentioned here. Inequality join with closest is okay, but equality join with closest are not working.

library(lubridate)
library(dplyr)
df1 <- data.frame(var2 = c("Dog", "Dog", "Cat"),
                  Date = dmy(c("01-01-2022","02-01-2022" , "07-12-2022")))
#   var2       Date
# 1  Dog 2022-01-01
# 2  Dog 2022-01-02
# 3  Cat 2022-12-07

df2 <- data.frame(Date = dmy(c("07-01-2022","04-12-2022" , "10-12-2022")))
#         Date
# 1 2022-01-07
# 2 2022-12-04
# 3 2022-12-10

df1 %>% 
  inner_join(df2, join_by(closest(Date <= Date)))
#    var2    Date.x     Date.y
# 1  Dog 2022-01-01 2022-01-07
# 2  Dog 2022-01-02 2022-01-07
# 3  Cat 2022-12-07 2022-12-10

df1 %>% 
  inner_join(df2, join_by(closest(Date == Date)))

# Error in `join_by()`:
#   ! The expression used in `closest()` can't use `==`.
# ℹ Expression 1 is `closest(Date == Date)`.
# Run `rlang::last_error()` to see where the error occurred.

Instead, it'd be nice to have a simple option for either direction:

df1 %>% 
  inner_join(df2, join_by(closest(Date == Date)), multiple = "all")
#    var2    Date.x     Date.y
# 1  Dog 2022-01-01 2022-01-07
# 2  Dog 2022-01-02 2022-01-07
# 3  Cat 2022-12-07 2022-12-04
# 4  Cat 2022-12-07 2022-12-10

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions