`count()` and `add_count()` could be much faster

Right now these eventually just do `summarise(n = n())` or `mutate(n = n())` at some point, but that can be very slow with many groups. We already have `vec_count()`, which should be much much faster than `count()` with many groups. We could also add some kind of vctrs primitive that works like a windowed count for `add_count()`, or just build on top of `vec_count()`'s result plus an additional call to `vec_match()`.

We'd have to think through how weighted counts would work, maybe `vec_count()` needs support for a weight argument (a double vector).

Motivation is something like this, and flights isn't even that big. Roughly 55k groups here.

``` r
library(dplyr)
library(nycflights13)

bench::mark(
  count(flights, dep_time, dep_delay),
  vctrs::vec_count(flights[c("dep_time", "dep_delay")]),
  check = FALSE
)
#> Warning: Some expressions had a GC in every iteration; so filtering is
#> disabled.
#> # A tibble: 2 × 6
#>   expression                                                 min  median itr/s…¹
#>   <bch:expr>                                            <bch:tm> <bch:t>   <dbl>
#> 1 count(flights, dep_time, dep_delay)                    419.6ms 441.4ms    2.27
#> 2 vctrs::vec_count(flights[c("dep_time", "dep_delay")])   17.3ms  21.5ms   42.7 
#> # … with 2 more variables: mem_alloc <bch:byt>, `gc/sec` <dbl>, and abbreviated
#> #   variable name ¹​`itr/sec`
```

Also need to handle the fact that `...` and `wt` are data-masking, probably with `add_computed_columns()` like `distinct()`.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

`count()` and `add_count()` could be much faster #6806

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

count() and add_count() could be much faster #6806

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

`count()` and `add_count()` could be much faster #6806