Skip to content

Update .groups message after group_by() |> summarize() #6986

@mine-cetinkaya-rundel

Description

@mine-cetinkaya-rundel

Currently this is the message dplyr emits for summarize() after group_by() with multiple variables.

library(dplyr)

mtcars |>
  group_by(vs, am) |>
  summarize(mean_mpg = mean(mpg))
#> `summarise()` has grouped output by 'vs'. You can override using the `.groups`
#> argument.
#> # A tibble: 4 × 3
#> # Groups:   vs [2]
#>      vs    am mean_mpg
#>   <dbl> <dbl>    <dbl>
#> 1     0     0     15.0
#> 2     0     1     19.8
#> 3     1     0     20.7
#> 4     1     1     28.4

Created on 2024-01-29 with reprex v2.0.2

I think this message is still confusing and would be more clear if the grouping message was about the output and it explicitly stated .groups is an argument in summarize(), e.g.,

The output is grouped by `vs`. You can specify grouping structure of the output using the `.groups` argument in `summarize()`.

If going this route some things to keep in mind:

  • Maybe "result" instead of "output" in two places in the message, or change the description of the .groups argument to say "Grouping structure of the output." Basically, we should match what we're calling the "thing" that the function spits out.
  • It would be a nice-to-have if US/UK spelling of the function in the message matched what the spelling in the code that generates the message.

An alternative suggestion by @DavisVaughan was

summarize() has computed your expressions grouped by (foo, bar), and has regrouped the output by (foo).

I think this is an improvement over the current message too, but I'd suggest going with something simpler like the one above.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions