dplyr 1.1.1
-
Mutating joins now warn about multiple matches much less often. At a high
level, a warning was previously being thrown when a one-to-many or
many-to-many relationship was detected between the keys ofxandy, but is
now only thrown for a many-to-many relationship, which is much rarer and much
more dangerous than one-to-many because it can result in a Cartesian explosion
in the number of rows returned from the join (#6731, #6717).We've accomplished this in two steps:
-
multiplenow defaults to"all", and the options of"error"and
"warning"are now deprecated in favor of usingrelationship(see below).
We are using an accelerated deprecation process for these two options
because they've only been available for a few weeks, andrelationshipis
a clearly superior alternative. -
The mutating joins gain a new
relationshipargument, allowing you to
optionally enforce one of the following relationship constraints between the
keys ofxandy:"one-to-one","one-to-many","many-to-one", or
"many-to-many".For example,
"many-to-one"enforces that each row inxcan match at
most 1 row iny. If a row inxmatches >1 rows iny, an error is
thrown. This option serves as the replacement formultiple = "error".The default behavior of
relationshipdoesn't assume that there is any
relationship betweenxandy. However, for equality joins it will check
for the presence of a many-to-many relationship, and will warn if it detects
one.
This change unfortunately does mean that if you have set
multiple = "all"to
avoid a warning and you happened to be doing a many-to-many style join, then
you will need to replacemultiple = "all"with
relationship = "many-to-many"to silence the new warning, but we believe
this should be rare since many-to-many relationships are fairly uncommon. -
-
Fixed a major performance regression in
case_when(). It is still a little
slower than in dplyr 1.0.10, but we plan to improve this further in the future
(#6674). -
Fixed a performance regression related to
nth(),first(), andlast()
(#6682). -
Fixed an issue where expressions involving infix operators had an abnormally
large amount of overhead (#6681). -
group_data()on ungrouped data frames is faster (#6736). -
n()is a little faster when there are many groups (#6727). -
pick()now returns a 1 row, 0 column tibble when...evaluates to an
empty selection. This makes it more compatible with tidyverse recycling
rules in some
edge cases (#6685). -
if_else()andcase_when()again accept logical conditions that have
attributes (#6678). -
arrange()can once again sort thenumeric_versiontype from base R
(#6680). -
slice_sample()now works when the input has a column namedreplace.
slice_min()andslice_max()now work when the input has columns named
na_rmorwith_ties(#6725). -
nth()now errors informatively ifnisNA(#6682). -
Joins now throw a more informative error when
ydoesn't have the same
source asx(#6798). -
All major dplyr verbs now throw an informative error message if the input
data frame contains a column namedNAor""(#6758). -
Deprecation warnings thrown by
filter()now mention the correct package
where the problem originated from (#6679). -
Fixed an issue where using
<-within a groupedmutate()orsummarise()
could cross contaminate other groups (#6666). -
The compatibility vignette has been replaced with a more general vignette on
using dplyr in packages,vignette("in-packages")(#6702). -
The developer documentation in
?dplyr_extendinghas been refreshed and
brought up to date with all changes made in 1.1.0 (#6695). -
rename_with()now includes an example of usingpaste0(recycle0 = TRUE)to
correctly handle empty selections (#6688). -
R >=3.5.0 is now explicitly required. This is in line with the tidyverse
policy of supporting the 5 most recent versions of
R.