Skip to content

Commit cd8ad64

Browse files
committed
Soft deprecate this case_when() behavior instead
1 parent ec70865 commit cd8ad64

File tree

4 files changed

+327
-105
lines changed

4 files changed

+327
-105
lines changed

NEWS.md

Lines changed: 28 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,33 @@
11
# dplyr (development version)
22

3-
* `case_when()` now determines the output size from only the LHS inputs, rather than from both the LHS and RHS inputs. This allows the legacy behavior of `TRUE ~` to continue to work, while improving `case_when()`'s safety by guarding against some silent confusing failure cases (#7082).
3+
* In `case_when()`, supplying all size 1 LHS inputs along with a size >1 RHS input is now soft-deprecated. This is an improper usage of `case_when()` that should instead be a series of if statements, like:
4+
5+
```
6+
# Scalars!
7+
code <- 1L
8+
flavor <- "vanilla"
9+
10+
# Previously
11+
case_when(
12+
code == 1L && flavor == "chocolate" ~ x,
13+
code == 1L && flavor == "vanilla" ~ y,
14+
code == 2L && flavor == "vanilla" ~ z,
15+
.default = default
16+
)
17+
18+
# Now
19+
if (code == 1L && flavor == "chocolate") {
20+
x
21+
} else if (code == 1L && flavor == "vanilla") {
22+
y
23+
} else if (code == 2L && flavor == "vanilla") {
24+
z
25+
} else {
26+
default
27+
}
28+
```
29+
30+
The recycling behavior that allows this style of `case_when()` to work is unsafe, and can result in silent bugs that we'd like to guard against with an error in the future (#7082).
431

532
* `if_else()` no longer allows `condition` to be a logical array. It must be a logical vector with no `dim` attribute (#7723).
633

R/case-when.R

Lines changed: 151 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -163,17 +163,15 @@ case_when <- function(..., .default = NULL, .ptype = NULL, .size = NULL) {
163163
conditions <- args$lhs
164164
values <- args$rhs
165165

166-
# Unfortunate historical behavior patch:
167-
#
168-
# `case_when()`'s formula interface finds the common size of all LHS inputs.
169-
# This is what allows `TRUE ~` to work. Unfortunately, this has caused a lot
170-
# of confusion over the years when `TRUE ~` isn't involved (#7082) and we wish
171-
# that we could enforce that all LHS inputs must be the same size (without
172-
# recycling).
173-
#
174-
# Additionally, for `case_when()`, the LHS common size is the size that each
175-
# RHS value must recycle to.
176-
.size <- vec_size_common(!!!conditions, .size = .size)
166+
.size <- case_when_size_common(
167+
conditions = conditions,
168+
values = values,
169+
size = .size
170+
)
171+
172+
# Only recycle `conditions`. Expect that `vec_case_when()` requires all
173+
# `conditions` to be the same size, but can efficiently recycle `values`
174+
# at the C level without extra allocations.
177175
conditions <- vec_recycle_common(!!!conditions, .size = .size)
178176

179177
vec_case_when(
@@ -189,6 +187,148 @@ case_when <- function(..., .default = NULL, .ptype = NULL, .size = NULL) {
189187
)
190188
}
191189

190+
# Size common computation for `case_when()`
191+
#
192+
# `case_when()`'s formula interface historically finds the common size of ALL
193+
# inputs. This is not good, ideally it would force all LHS inputs to have the
194+
# same size (with no recycling), and then recycle all RHS inputs to that size
195+
# inferred from the LHS. That is how `vec_case_when()` works.
196+
#
197+
# We can't change this easily for two reasons:
198+
#
199+
# - `TRUE ~` must continue to work for legacy reasons, so at the very least all
200+
# LHS inputs must be recycled against each other. We are okay with this.
201+
#
202+
# - Many packages (60+) use `case_when()` with scalar LHSs but vector RHSs,
203+
# requiring that all inputs by recycled against each other. This usage should
204+
# be replaced with a series of if statements. This is a highly inefficient use
205+
# of `case_when()` because each scalar LHS has to be recycled to the size
206+
# determined from the RHS, which is a big waste of memory and time. This
207+
# behavior can also allow real bugs to slip through silently (#7082), which is
208+
# bad. To combat this case, we specially detect this and throw a deprecation
209+
# warning.
210+
#
211+
# There are four cases to consider:
212+
#
213+
# 1. `size_conditions == 1, size_values == 1`
214+
#
215+
# Fine, use size 1
216+
#
217+
# 2. `size_conditions == 1, size_values != 1`
218+
#
219+
# Use `size_values` for historical reasons, but warn against this. This is
220+
# people doing off-label usage of `case_when()` when they should be using a
221+
# series of if statements.
222+
#
223+
# 3. `size_conditions != 1, size_values == 1`
224+
#
225+
# Fine, use `size_conditions`
226+
#
227+
# 4. `size_conditions != 1, size_values != 1`
228+
#
229+
# If `size_conditions == size_values`, good to go, else throw an error by
230+
# recalling `vec_size_common()` with all inputs.
231+
case_when_size_common <- function(
232+
conditions,
233+
values,
234+
size,
235+
...,
236+
user_env = caller_env(2),
237+
error_call = caller_env()
238+
) {
239+
# These error if there are any size incompatibilites within LHS and RHS inputs,
240+
# but not across LHS and RHS inputs
241+
size_conditions <- vec_size_common(
242+
!!!conditions,
243+
.size = size,
244+
.call = error_call
245+
)
246+
size_values <- vec_size_common(
247+
!!!values,
248+
.size = size,
249+
.call = error_call
250+
)
251+
252+
if (size_conditions == 1L && size_values == 1L) {
253+
return(1L)
254+
}
255+
256+
if (size_conditions == 1L && size_values != 1L) {
257+
warn_case_when_scalar_lhs_vector_rhs(
258+
env = error_call,
259+
user_env = user_env
260+
)
261+
return(size_values)
262+
}
263+
264+
if (size_conditions != 1L && size_values == 1L) {
265+
return(size_conditions)
266+
}
267+
268+
if (size_conditions != 1L && size_values != 1L) {
269+
if (size_conditions == size_values) {
270+
return(size_conditions)
271+
}
272+
273+
# Errors
274+
vec_size_common(
275+
!!!conditions,
276+
!!!values,
277+
.size = size,
278+
.call = error_call
279+
)
280+
281+
abort("`vec_size_common()` should have errored.", .internal = TRUE)
282+
}
283+
284+
abort("All cases should have been covered.", .internal = TRUE)
285+
}
286+
287+
warn_case_when_scalar_lhs_vector_rhs <- function(
288+
env,
289+
user_env
290+
) {
291+
what <- I(
292+
"Calling `case_when()` with size 1 LHS inputs and size >1 RHS inputs"
293+
)
294+
295+
details <- no_cli_wrapping(paste(
296+
sep = "\n",
297+
"This `case_when()` statement can result in subtle silent bugs and is very inefficient.",
298+
"",
299+
" Please use a series of if statements instead:",
300+
"",
301+
" ```",
302+
" # Previously",
303+
" case_when(scalar_lhs1 ~ rhs1, scalar_lhs2 ~ rhs2, .default = default)",
304+
"",
305+
" # Now",
306+
" if (scalar_lhs1) {",
307+
" rhs1",
308+
" } else if (scalar_lhs2) {",
309+
" rhs2",
310+
" } else {",
311+
" default",
312+
" }",
313+
" ```"
314+
))
315+
316+
lifecycle::deprecate_soft(
317+
when = "1.2.0",
318+
what = what,
319+
details = details,
320+
env = env,
321+
user_env = user_env
322+
)
323+
}
324+
325+
# Suppress cli wrapping https://cli.r-lib.org/reference/inline-markup.html#wrapping
326+
no_cli_wrapping <- function(x) {
327+
x <- gsub(" ", "\u00a0", x, fixed = TRUE)
328+
x <- gsub("\n", "\f", x, fixed = TRUE)
329+
x
330+
}
331+
192332
case_formula_evaluate <- function(args, default_env, dots_env, error_call) {
193333
# `case_when()`'s formula interface compacts `NULL`s
194334
args <- compact_null(args)

tests/testthat/_snaps/case-when.md

Lines changed: 81 additions & 47 deletions
Original file line numberDiff line numberDiff line change
@@ -35,6 +35,25 @@
3535
Code
3636
case_when(1 ~ NULL)
3737
Condition
38+
Warning:
39+
Calling `case_when()` with size 1 LHS inputs and size >1 RHS inputs was deprecated in dplyr 1.2.0.
40+
i This `case_when()` statement can result in subtle silent bugs and is very inefficient.
41+
42+
Please use a series of if statements instead:
43+
44+
```
45+
# Previously
46+
case_when(scalar_lhs1 ~ rhs1, scalar_lhs2 ~ rhs2, .default = default)
47+
48+
# Now
49+
if (scalar_lhs1) {
50+
rhs1
51+
} else if (scalar_lhs2) {
52+
rhs2
53+
} else {
54+
default
55+
}
56+
```
3857
Error in `case_when()`:
3958
! `..1 (right)` must be a vector, not `NULL`.
4059

@@ -66,57 +85,14 @@
6685
Caused by error:
6786
! oh no!
6887

69-
# only LHS sizes are consulted to compute the output size (#7082)
70-
71-
Code
72-
x <- 1
73-
case_when(x == 1 ~ "a", x == 2 ~ character(), .default = "other")
74-
Condition
75-
Error in `case_when()`:
76-
! `..2 (right)` must have size 1, not size 0.
77-
78-
---
79-
80-
Code
81-
case_when(TRUE ~ 1:3, FALSE ~ 4:6)
82-
Condition
83-
Error in `case_when()`:
84-
! `..1 (right)` must have size 1, not size 3.
85-
86-
---
87-
88-
Code
89-
case_when(NA ~ 1:3, TRUE ~ 4:6)
90-
Condition
91-
Error in `case_when()`:
92-
! `..1 (right)` must have size 1, not size 3.
93-
94-
---
95-
96-
Code
97-
x <- 1:3
98-
my_scalar_condition <- is.double(x)
99-
case_when(my_scalar_condition ~ x, TRUE ~ 4:6)
100-
Condition
101-
Error in `case_when()`:
102-
! `..1 (right)` must have size 1, not size 3.
103-
104-
---
105-
106-
Code
107-
case_when(TRUE ~ integer(), FALSE ~ integer())
108-
Condition
109-
Error in `case_when()`:
110-
! `..1 (right)` must have size 1, not size 0.
111-
11288
# case_when() give meaningful errors
11389

11490
Code
11591
(expect_error(case_when(c(TRUE, FALSE) ~ 1:3, c(FALSE, TRUE) ~ 1:2)))
11692
Output
117-
<error/vctrs_error_assert_size>
93+
<error/vctrs_error_incompatible_size>
11894
Error in `case_when()`:
119-
! `..1 (right)` must have size 2, not size 3.
95+
! Can't recycle `..1 (right)` (size 3) to match `..2 (right)` (size 2).
12096
Code
12197
(expect_error(case_when(c(TRUE, FALSE) ~ 1, c(FALSE, TRUE, FALSE) ~ 2, c(FALSE,
12298
TRUE, FALSE, NA) ~ 3)))
@@ -125,11 +101,11 @@
125101
Error in `case_when()`:
126102
! Can't recycle `..1 (left)` (size 2) to match `..2 (left)` (size 3).
127103
Code
128-
(expect_error(case_when(50 ~ 1:3)))
104+
(expect_error(case_when(51:53 ~ 1:3)))
129105
Output
130106
<error/rlang_error>
131107
Error in `case_when()`:
132-
! `..1 (left)` must be a logical vector, not the number 50.
108+
! `..1 (left)` must be a logical vector, not an integer vector.
133109
Code
134110
(expect_error(case_when(paste(50))))
135111
Output
@@ -161,3 +137,61 @@
161137
Error in `case_when()`:
162138
! Case 1 (`~1:2`) must be a two-sided formula.
163139

140+
# Using scalar LHS with vector RHS is deprecated (#7082)
141+
142+
Code
143+
x <- 1:5
144+
y <- 6:10
145+
code <- 1L
146+
sex <- "M"
147+
expect_identical(case_when(code == 1L && sex == "M" ~ x, code == 1L && sex ==
148+
"F" ~ y, code == 1L && sex == "M" ~ x + 1L, .default = 0L), x)
149+
Condition
150+
Warning:
151+
Calling `case_when()` with size 1 LHS inputs and size >1 RHS inputs was deprecated in dplyr 1.2.0.
152+
i This `case_when()` statement can result in subtle silent bugs and is very inefficient.
153+
154+
Please use a series of if statements instead:
155+
156+
```
157+
# Previously
158+
case_when(scalar_lhs1 ~ rhs1, scalar_lhs2 ~ rhs2, .default = default)
159+
160+
# Now
161+
if (scalar_lhs1) {
162+
rhs1
163+
} else if (scalar_lhs2) {
164+
rhs2
165+
} else {
166+
default
167+
}
168+
```
169+
170+
---
171+
172+
Code
173+
x <- 1
174+
case_when(x == 1 ~ "a", x == 2 ~ character(), .default = "other")
175+
Condition
176+
Warning:
177+
Calling `case_when()` with size 1 LHS inputs and size >1 RHS inputs was deprecated in dplyr 1.2.0.
178+
i This `case_when()` statement can result in subtle silent bugs and is very inefficient.
179+
180+
Please use a series of if statements instead:
181+
182+
```
183+
# Previously
184+
case_when(scalar_lhs1 ~ rhs1, scalar_lhs2 ~ rhs2, .default = default)
185+
186+
# Now
187+
if (scalar_lhs1) {
188+
rhs1
189+
} else if (scalar_lhs2) {
190+
rhs2
191+
} else {
192+
default
193+
}
194+
```
195+
Output
196+
character(0)
197+

0 commit comments

Comments
 (0)