Skip to content

Referencing other columns inside mutate/summarise across is broken in 1.0.4 #5734

@juancamilog

Description

@juancamilog

In dplyr 1.0.3 you can reference other columns in the same data-frame/tibble group by name. This functionality is broken in 1.0.4

To reproduce, the following example works in 1.0.3

storms %>% mutate(across(c('wind', 'pressure'), function(x) {
  return(x/lat)
}))

In 1.0.4, running the above example results in the following error message

>rlang::last_error()
<error/dplyr:::mutate_error>
Problem with mutate() input ..1.
x object 'lat' not found

i.e. we can't reference other columns by name. A possible workaround is to use cur_data()$name_of_column but this is slower as the following benchmark demonstrates:

library(dplyr, warn.conflicts = F)

df <- tibble(cbind.data.frame(
  grp_1 = sort(rep(1:250, 4)),
  grp_2 = rep(1:4, 250), 
  matrix(rnorm(1000 * 100), nrow = 1000)))

bench::mark(iterations = 100,
            filter_gc = FALSE,
            use_cur_data = df %>% summarise(across(is.numeric, function(x) {
              rows = cur_data()
              mask = (rows$grp_1 %% 2) == 0
              return(mean(x[mask] / rows$grp_2[mask]))
            })),
            direct_reference = df %>% summarise(across(is.numeric, function(x) {
              mask = (grp_1 %% 2) == 0
              return(mean(x[mask] / grp_2[mask]))
            }))) %>%
  select(expression, min, median, `itr/sec`, `gc/sec`)

which results in the following output

# A tibble: 2 x 5
  expression            min   median `itr/sec` `gc/sec`
  <bch:expr>       <bch:tm> <bch:tm>     <dbl>    <dbl>
1 use_cur_data       17.2ms  18.29ms      51.1    10.7 
2 direct_reference   8.12ms   8.59ms     108.      9.76

TL;DR In dply 1.0.3, using cur_data()$column_name to reference columns instead of directly using the column names can be considerably slower. In 1.0.4 referencing columns by name, not using cur_data, is currently broken.

Metadata

Metadata

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions