Skip to content

Commit ea305c0

Browse files
authored
Soft-deprecate case_when() with all size 1 LHS and any size >1 RHS (#7724)
* Only consult LHS inputs to determine `case_when()` output size * Push depressing revdep results * Soft deprecate this `case_when()` behavior instead
1 parent b3f2115 commit ea305c0

File tree

9 files changed

+7626
-6563
lines changed

9 files changed

+7626
-6563
lines changed

NEWS.md

Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,34 @@
11
# dplyr (development version)
22

3+
* In `case_when()`, supplying all size 1 LHS inputs along with a size >1 RHS input is now soft-deprecated. This is an improper usage of `case_when()` that should instead be a series of if statements, like:
4+
5+
```
6+
# Scalars!
7+
code <- 1L
8+
flavor <- "vanilla"
9+
10+
# Previously
11+
case_when(
12+
code == 1L && flavor == "chocolate" ~ x,
13+
code == 1L && flavor == "vanilla" ~ y,
14+
code == 2L && flavor == "vanilla" ~ z,
15+
.default = default
16+
)
17+
18+
# Now
19+
if (code == 1L && flavor == "chocolate") {
20+
x
21+
} else if (code == 1L && flavor == "vanilla") {
22+
y
23+
} else if (code == 2L && flavor == "vanilla") {
24+
z
25+
} else {
26+
default
27+
}
28+
```
29+
30+
The recycling behavior that allows this style of `case_when()` to work is unsafe, and can result in silent bugs that we'd like to guard against with an error in the future (#7082).
31+
332
* The following vector functions have gotten significantly faster and use much less memory due to a rewrite in C via vctrs (#7723):
433

534
* `if_else()`

R/case-when.R

Lines changed: 160 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -16,18 +16,18 @@
1616
#'
1717
#' The RHS inputs will be coerced to their common type.
1818
#'
19-
#' All inputs will be recycled to their common size. That said, we encourage
20-
#' all LHS inputs to be the same size. Recycling is mainly useful for RHS
21-
#' inputs, where you might supply a size 1 input that will be recycled to the
22-
#' size of the LHS inputs.
19+
#' For historical reasons, all LHS inputs will be recycled to their common
20+
#' size. That said, we encourage all LHS inputs to be the same size, which you
21+
#' can optionally enforce with `.size`. All RHS inputs will be recycled to the
22+
#' common size of the LHS inputs.
2323
#'
2424
#' `NULL` inputs are ignored.
2525
#'
2626
#' @param .default The value used when all of the LHS inputs return either
2727
#' `FALSE` or `NA`.
2828
#'
2929
#' `.default` must be size 1 or the same size as the common size computed
30-
#' from `...`.
30+
#' from the LHS inputs.
3131
#'
3232
#' `.default` participates in the computation of the common type with the RHS
3333
#' inputs.
@@ -45,11 +45,12 @@
4545
#' supplied, this overrides the common type of the RHS inputs.
4646
#'
4747
#' @param .size An optional size declaring the desired output size. If supplied,
48-
#' this overrides the common size computed from `...`.
48+
#' this overrides the common size computed from the LHS inputs.
4949
#'
50-
#' @return A vector with the same size as the common size computed from the
51-
#' inputs in `...` and the same type as the common type of the RHS inputs
52-
#' in `...`.
50+
#' @return A vector
51+
#'
52+
#' - The size of the vector is the common size of the LHS inputs, or `.size`.
53+
#' - The type of the vector is the common type of the RHS inputs, or `.ptype`.
5354
#'
5455
#' @seealso [case_match()]
5556
#'
@@ -162,12 +163,16 @@ case_when <- function(..., .default = NULL, .ptype = NULL, .size = NULL) {
162163
conditions <- args$lhs
163164
values <- args$rhs
164165

165-
# `case_when()`'s formula interface finds the common size of ALL of its inputs.
166-
# This is what allows `TRUE ~` to work.
167-
.size <- vec_size_common(!!!conditions, !!!values, .size = .size)
166+
.size <- case_when_size_common(
167+
conditions = conditions,
168+
values = values,
169+
size = .size
170+
)
168171

172+
# Only recycle `conditions`. Expect that `vec_case_when()` requires all
173+
# `conditions` to be the same size, but can efficiently recycle `values`
174+
# at the C level without extra allocations.
169175
conditions <- vec_recycle_common(!!!conditions, .size = .size)
170-
values <- vec_recycle_common(!!!values, .size = .size)
171176

172177
vec_case_when(
173178
conditions = conditions,
@@ -182,6 +187,148 @@ case_when <- function(..., .default = NULL, .ptype = NULL, .size = NULL) {
182187
)
183188
}
184189

190+
# Size common computation for `case_when()`
191+
#
192+
# `case_when()`'s formula interface historically finds the common size of ALL
193+
# inputs. This is not good, ideally it would force all LHS inputs to have the
194+
# same size (with no recycling), and then recycle all RHS inputs to that size
195+
# inferred from the LHS. That is how `vec_case_when()` works.
196+
#
197+
# We can't change this easily for two reasons:
198+
#
199+
# - `TRUE ~` must continue to work for legacy reasons, so at the very least all
200+
# LHS inputs must be recycled against each other. We are okay with this.
201+
#
202+
# - Many packages (60+) use `case_when()` with scalar LHSs but vector RHSs,
203+
# requiring that all inputs by recycled against each other. This usage should
204+
# be replaced with a series of if statements. This is a highly inefficient use
205+
# of `case_when()` because each scalar LHS has to be recycled to the size
206+
# determined from the RHS, which is a big waste of memory and time. This
207+
# behavior can also allow real bugs to slip through silently (#7082), which is
208+
# bad. To combat this case, we specially detect this and throw a deprecation
209+
# warning.
210+
#
211+
# There are four cases to consider:
212+
#
213+
# 1. `size_conditions == 1, size_values == 1`
214+
#
215+
# Fine, use size 1
216+
#
217+
# 2. `size_conditions == 1, size_values != 1`
218+
#
219+
# Use `size_values` for historical reasons, but warn against this. This is
220+
# people doing off-label usage of `case_when()` when they should be using a
221+
# series of if statements.
222+
#
223+
# 3. `size_conditions != 1, size_values == 1`
224+
#
225+
# Fine, use `size_conditions`
226+
#
227+
# 4. `size_conditions != 1, size_values != 1`
228+
#
229+
# If `size_conditions == size_values`, good to go, else throw an error by
230+
# recalling `vec_size_common()` with all inputs.
231+
case_when_size_common <- function(
232+
conditions,
233+
values,
234+
size,
235+
...,
236+
user_env = caller_env(2),
237+
error_call = caller_env()
238+
) {
239+
# These error if there are any size incompatibilites within LHS and RHS inputs,
240+
# but not across LHS and RHS inputs
241+
size_conditions <- vec_size_common(
242+
!!!conditions,
243+
.size = size,
244+
.call = error_call
245+
)
246+
size_values <- vec_size_common(
247+
!!!values,
248+
.size = size,
249+
.call = error_call
250+
)
251+
252+
if (size_conditions == 1L && size_values == 1L) {
253+
return(1L)
254+
}
255+
256+
if (size_conditions == 1L && size_values != 1L) {
257+
warn_case_when_scalar_lhs_vector_rhs(
258+
env = error_call,
259+
user_env = user_env
260+
)
261+
return(size_values)
262+
}
263+
264+
if (size_conditions != 1L && size_values == 1L) {
265+
return(size_conditions)
266+
}
267+
268+
if (size_conditions != 1L && size_values != 1L) {
269+
if (size_conditions == size_values) {
270+
return(size_conditions)
271+
}
272+
273+
# Errors
274+
vec_size_common(
275+
!!!conditions,
276+
!!!values,
277+
.size = size,
278+
.call = error_call
279+
)
280+
281+
abort("`vec_size_common()` should have errored.", .internal = TRUE)
282+
}
283+
284+
abort("All cases should have been covered.", .internal = TRUE)
285+
}
286+
287+
warn_case_when_scalar_lhs_vector_rhs <- function(
288+
env,
289+
user_env
290+
) {
291+
what <- I(
292+
"Calling `case_when()` with size 1 LHS inputs and size >1 RHS inputs"
293+
)
294+
295+
details <- no_cli_wrapping(paste(
296+
sep = "\n",
297+
"This `case_when()` statement can result in subtle silent bugs and is very inefficient.",
298+
"",
299+
" Please use a series of if statements instead:",
300+
"",
301+
" ```",
302+
" # Previously",
303+
" case_when(scalar_lhs1 ~ rhs1, scalar_lhs2 ~ rhs2, .default = default)",
304+
"",
305+
" # Now",
306+
" if (scalar_lhs1) {",
307+
" rhs1",
308+
" } else if (scalar_lhs2) {",
309+
" rhs2",
310+
" } else {",
311+
" default",
312+
" }",
313+
" ```"
314+
))
315+
316+
lifecycle::deprecate_soft(
317+
when = "1.2.0",
318+
what = what,
319+
details = details,
320+
env = env,
321+
user_env = user_env
322+
)
323+
}
324+
325+
# Suppress cli wrapping https://cli.r-lib.org/reference/inline-markup.html#wrapping
326+
no_cli_wrapping <- function(x) {
327+
x <- gsub(" ", "\u00a0", x, fixed = TRUE)
328+
x <- gsub("\n", "\f", x, fixed = TRUE)
329+
x
330+
}
331+
185332
case_formula_evaluate <- function(args, default_env, dots_env, error_call) {
186333
# `case_when()`'s formula interface compacts `NULL`s
187334
args <- compact_null(args)

man/case_when.Rd

Lines changed: 11 additions & 9 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

0 commit comments

Comments
 (0)