Skip to content

Conversation

@DavisVaughan
Copy link
Member

@DavisVaughan DavisVaughan commented Oct 1, 2025

Five goals:

  • Significantly speed up and reduce memory of coalesce() using vctrs::vec_case_when()
  • Fix the error generated by coalesce(1, NULL, "x"), where the reported index was off due to compacting out NULL too early
  • Give a better error when all inputs are NULL
  • Get rid of usage of early-naming via names_as_error_names(), which is no longer required with the C impl of vec_case_when() (it falls out naturally when you rewrite in C)
  • Improve on consistency regarding the output names generated by coalesce()
# Most common case
cross::bench_branches(\() {
  library(dplyr)
  set.seed(123)
  x <- sample(c(1, NA, 2), size = 1e7, replace = TRUE)
  bench::mark(coalesce(x, 0))
})
#> # A tibble: 2 × 7
#>   branch                    expression          min   median `itr/sec` mem_alloc `gc/sec`
#>   <chr>                     <bch:expr>     <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 feature/coalesce-on-vctrs coalesce(x, 0)   24.5ms   29.9ms     27.4      162MB     37.1
#> 2 main                      coalesce(x, 0)    227ms    235ms      4.26     611MB     12.8

# Missings in `y`
cross::bench_branches(\() {
  library(dplyr)
  set.seed(123)
  x <- sample(c(1, NA, 2), size = 1e7, replace = TRUE)
  y <- sample(c(1, NA, 2), size = 1e7, replace = TRUE)
  bench::mark(coalesce(x, y))
})
#> # A tibble: 2 × 7
#>   branch                    expression          min   median `itr/sec` mem_alloc `gc/sec`
#>   <chr>                     <bch:expr>     <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 feature/coalesce-on-vctrs coalesce(x, y)   30.4ms   34.4ms     25.6      229MB     41.4
#> 2 main                      coalesce(x, y)  209.1ms    211ms      4.63     572MB     18.5

# No missings in `y`
cross::bench_branches(\() {
  library(dplyr)
  set.seed(123)
  x <- sample(c(1, NA, 2), size = 1e7, replace = TRUE)
  y <- sample(c(1, 2), size = 1e7, replace = TRUE)
  bench::mark(coalesce(x, y))
})
#> # A tibble: 2 × 7
#>   branch                    expression          min   median `itr/sec` mem_alloc `gc/sec`
#>   <chr>                     <bch:expr>     <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 feature/coalesce-on-vctrs coalesce(x, y)   25.5ms   26.1ms     35.4      162MB    165. 
#> 2 main                      coalesce(x, y)  223.7ms  226.3ms      4.30     534MB     17.2

I'm happy that I can explain where all of these remaining allocations come from:

# - 1 alloc for `vec_detect_missing` (40000048, lgl vec)
# - 1 alloc for `!` of the `vec_detect_missing` result (40000048, lgl vec)
# - 1 alloc for <double> output (80000048, dbl vec)
# - 1 alloc for determining where `0` goes (10000048, bool vec at C level)
profmem::profmem(coalesce(x, 0))
#> Rprofmem memory profiling of:
#> coalesce(x, 0)
#> 
#> Memory allocations:
#> Number of 'new page' entries not displayed: 3
#>        what     bytes                                                            calls
#> 3     alloc       216                                 coalesce() -> vec_ptype_common()
#> 4     alloc       216                                  coalesce() -> vec_size_common()
#> 6     alloc  40000048 coalesce() -> map() -> lapply() -> FUN() -> vec_detect_missing()
#> 7     alloc  40000048                         coalesce() -> map() -> lapply() -> FUN()
#> 8     alloc  80000048                                        coalesce() -> <Anonymous>
#> 9     alloc  10000048                                        coalesce() -> <Anonymous>
#> total       170000624

@DavisVaughan DavisVaughan force-pushed the feature/coalesce-on-vctrs branch from 7f3f5ca to 8043214 Compare October 2, 2025 15:23
@DavisVaughan DavisVaughan merged commit b3f2115 into main Oct 2, 2025
14 checks passed
@DavisVaughan DavisVaughan deleted the feature/coalesce-on-vctrs branch October 2, 2025 15:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants