Releases: tidyverse/dbplyr
dbplyr 2.5.1
-
Improved translations:
- SQL Server, Redshift, Snowflake, and Postgres now correctly translate
difftime()(the sign is now correct) (@edward-burn, #1532). - SQL server, Redshift, Snowflake, Postgres, and Spark now
translatedate_count_between()(@edward-burn, #1495). - SQL Server now supports
pmin()andpmax()(@edward-burn, #1602). - SQL server uses
COUNT_BIGinstead ofCOUNTsotally()and
count()work regardless of data size (@edward-burn, #1498). - Spark correctly translates
clock::add_years()(@ablack3, #1510). - Teradata now supports
as.double()andas.character()(@rplsmn, #1545).
- SQL Server, Redshift, Snowflake, and Postgres now correctly translate
-
Tightened argument checks for SQL translations. These changes should
result in more informative errors in cases where code already failed, possibly
silently; if you see errors with code that used to run correctly, please report
them to the package authors (@simonpcouch, #1554, #1555). -
across(everything())doesn't select grouping columns created via.byin
summarise()(@mgirlich, #1493). -
Spark SQL backend now supports persisting tables with
compute(x, name = I("x.y.z"), temporary = FALSE)(@zacdav-db, #1502).
dbplyr 2.5.0
Improved tools for qualified table names
-
Specification of table names with schema/catalogs has been overhauled to
make it simpler. This includes the following features and fixes:-
The simplest way to refer to a qualified table is now to wrap it in
I(), e.g.I("schema_name.table_name"). -
Use of
sql()andident_q()insidein_catalog()andin_schema()
is once again supported (#1388). -
It's ok to use
ident_q()once again (#1413) and you should no longer
see unsuppressable warnings about usingin_schema()(#1408). -
The names of the arguments to
Id()no longer matter, only their order
(#1416). Additionally, thanks to changes to the DBI package, you no
longer need to name each argument. -
If you accidentally pass a named vector to any of the database identifer
functions, those names will be automatically stripped (#1404). -
tbl_sql(check_from)is now deprecated.
-
-
dbplyr now exports some tools to work with the internal
table_pathclass
which is useful for certain backends that need to work with this
data structure (#1300).
Improved SQL
-
New translations for clock functions
add_years(),add_days(),
date_build(),get_year(),get_month(),get_day(),
andbase::difftime()on SQL server, Redshift, Snowflake, and Postgres. -
select()will keep computed columns used toarrange()subqueries that are
eliminated by a subsequent select (@ejneer, #1437). -
semi_join()will no longer inline away an aggregate filter (i.e.HAVING
clause) that was followed by aselect()(@ejneer, #1474) -
Improved function translations:
-
Functions qualified with the base namespace are now also translated, e.g.
base::paste0(x, "_1")is now translated (@mgirlich, #1022). -
-1 + xnow generates a translation instead erroring (#1420). -
x$namenever attempts to evaluatename(#1368). -
You can once again use
NULLon the LHS of an infix operator in order
to generate SQL with unusual syntax (#1345). -
Namespaced calls now error if the function doesn't exist, or a translation
is not available (#1426). -
lead()translation coercesnto an integer.
-
-
Databricks: now supports creating non-temporary tables too (#1418).
-
Oracle:
-
db_explain()now works (@thomashulst, #1353). -
as.Date()works when applied to a string (#1389). -
head()is once again translated toFETCH FIRST. This does require Oracle
12c or newer, but it actually works, compared to the approach using
ROWNUMfrom #1292 (#1436). -
Added support for
str_replace()andstr_replace_all()via
REGEXP_REPLACE()(@thomashulst, #1402).
-
-
Snowflake (@nathanhaigh, #1406)
-
Added support for
str_starts()andstr_ends()viaREGEXP_INSTR() -
Refactored
str_detect()to useREGEXP_INSTR()so now supports
regular expressions. -
Refactored
grepl()to useREGEXP_INSTR()so now supports
case-insensitive matching throughgrepl(..., ignore.case = TRUE)
-
-
SQL server:
-
MySQL:
as.integer()gets correct translation (@krlmlr, #1375).
Minor improvements and bug fixes
-
Deprecation status of functions deprecated in previous versions (at least
2 years old) have been advanced. In particular,src_sql()is now defunct,
as is the use ofpartial_eval()with characterdata. -
Database errors now show the generated SQL, which hopefully will make it
faster to track down problems (#1401). -
When dbplyr creates an index on a table in a schema (e.g.
schema.table),
it now only includes the table name in the index name, not the schema name. -
The class of remote sources now includes all S4 class names, not just
the first (#918). -
compute()passes additional arguments all the way to
sql_query_save()-methods (@rsund). -
db_sql_render()correctly passes on...when re-calling with
sql_optionsset (#1394). -
reframe()now gives an informative error that it isn't supported (#1148). -
rows_patch(in_place = FALSE)now works when more than one column should be
patched (@gorcha, #1443). -
sql_translator()now checks for duplicated definitions (@krlmlr, #1374).
dbplyr 2.4.0
Breaking changes
-
Using
compute(temporary = FALSE)without providing a name is now
deprecated (@mgirlich, #1154). -
ntile()'s first argument has been renamed fromorder_bytoxto
match the interface ofdplyr::ntile()(@mgirlich, #1242). -
simulate_vars()andsimulate_vars_is_typed()were removed as they weren't
used and tidyselect now offerstidyselect_data_proxy()and
tidyselect_data_has_predicates()(@mgirllich, #1199). -
sql_not_supported()now expects a function name without parentheses. -
sql_query_append(),sql_query_insert(),sql_query_update(),
sql_query_upsert(), andsql_query_delete()changed their arguments to
make them more consistent to the othersql_query_*()functions:x_namewas renamed totable.ywas renamed tofromand must now be a table identifier or SQL instead
of a lazy table.sql_query_append()andsql_query_insert()have gained the argumentcols.
-
remote_name()now returns a string with the name of the table. To get the
qualified identifier use the newly addedremote_table()(@mgirlich, #1280). -
tbl_lazy()losessrcargument after it has been deprecated for years
(@mgirlich, #1208). -
translate_sql()now requires theconargument (@mgirlich, #1311).
Thevarsargument has been removed after it threw an error for the last 7
years (@mgirlich).
Improved SQL
-
Preliminary databricks Spark SQL backend (#1377).
-
Joins
-
*_join()now allows specifying the relationship argument. It must be
NULLor"many-to-many"(@bairdj, #1305). -
Queries now qualify
*with the table alias for better compatibility
(@mgirlich, #1003). -
full_join()can now handle column names that only differ in case
(@ejneer, #1255). -
The
na_matchesargument ofsemi_join()andanti_join()works again
(@mgirlich, #1211). -
A
semi/anti_join()on fitleredyis inlined when possible (@mgirlich, #884). -
Joins now work again for Pool and Oracle connections (@mgirlich, #1177, #1181).
-
A sequence of
union()resp.union_all()now produces a flat query
instead of subqueries (@mgirlich, #1269). -
Added translations for:
-
if_any()andif_all()translations are now wrapped in parentheses.
This makes sure it can be combined via&with other conditions
(@mgirlich, #1153). -
nth(),first(), andlast()now support thena_rmargument
(@mgirlich, #1193).
Minor improvements and bug fixes
-
across()now supports namespaced functions, e.g.
across(x, dplyr::dense_rank)(@mgirlich, #1231). -
db_copy_to(overwrite = TRUE)now actually works. -
db_copy_to()'s...are now passed todb_write_table()(@mgirlich, #1237). -
Added
db_supports_table_alias_with_as()to customise whether a backend
supports specifying a table alias withASor not (@mgirlich). -
db_write_table()anddb_save_query()gain theoverwriteargument. -
dbplyr_pivot_wider_spec()is now exported. Unlikepivot_wider()this can
be lazy. Note that this will be removed soon afterpivot_wider_spec()
becomes a generic (@mgirlich). -
filter()ing with window functions now generates columns calledcol01
rather thanq01(@mgirlich, #1258). -
pivot_wider()now matches tidyrNAcolumn handling (@ejneer #1238). -
select()can once again be used afterarrange(desc(x))(@ejneer, #1240). -
show_query()andremote_query()gain the argumentsql_optionsthat allows
to control how the SQL is generated. It can be created viasql_options()
which has the following arguments:cte: use common table expressions?use_star: useSELECT *or explicitly select every column?qualify_all_columns: qualify all columns in a join or only the ambiguous ones?
(@mgirlich, #1146).
Consequently the
cteargument ofshow_query()andremote_query()has
been deprecated (@mgirlich, #1146). -
slice_min/max()can now order by multiple variables like dplyr, e.g. use
slice_min(lf, tibble(x, y))(@mgirlich, #1167). -
slice_*()now supports the data masking pronouns.envand.data(@mgirlich, #1294). -
sql_join_suffix()gains the argumentsuffixso that methods can check
whether the suffix is valid for the backend (@mgirlich). -
sql_random()is now deprecated. It was used to powerslice_sample()which
is now done via the translation forrunif()(@mgirlich, #1200). -
tbl()now informs when the user probably forgot to wrap the table identifier
within_schema()orsql()(@mgirlich, #1287).
Backend specific improvements
-
Access
-
DuckDB
- now supports the
returningargument ofrows_*().
- now supports the
-
MySQL/MariaDB:
rows_update()androws_patch()now give an informative error when the
unsupportedreturningargument is used (@mgirlich, #1279).rows_upsert()now gives an informative error that it isn't supported
(@mgirlich, #1279).rows_*()use the column types ofxwhen auto copyingy(@mgirlich, #1327).copy_inline()now works (@mgirlich, #1188).- Fix translation of
as.numeric(),as.POSIXct(),as_datetime(), and
as.integer64()(@avsdev-cw, #1189).
-
MS SQL:
row_number()now works when no order is specified (@ejneer, @fh-mthomson, #1332)
-
Oracle:
- Fix translation of
rows_upsert()(@mgirlich, @TBlackmore, #1286) head(n)is now translated toWHERE ROWNUM <= nto also support old
versions <= 11.2 (@JeremyPasco, #1292).
- Fix translation of
-
Postgres
-
SQLite
- Subqueries now also get an alias. This makes it consistent with other
backends and simplifies the implementation.
- Subqueries now also get an alias. This makes it consistent with other
-
SQL Server
-
Snowflake:
na.rm = TRUEis now respected inpmin()andpmax()instead of being silently ignored (@fh-mthomson, #1329)row_number()now works when no order is specified (@fh-mthomson, #1332)
-
Teradata
distinct()+head()now work (@mgirlich, #685).as.Date(x)is now translate toCAST(x AS DATE)again unlessxis a
string (@mgirlich, #1285).row_number()no longer defaults to partitioning by groups (now aligned with other databases when no order is specified:ROW_NUMBER()defaults toORDER BY (SELECT NULL)) (@fh-mthomson, #1331)
dbplyr 2.3.4
- Hot patch release to resolve R CMD check failures.
dbplyr 2.3.3
- Hot patch release to resolve R CMD check failures.
dbplyr 2.3.2
- Hot patch release to resolve R CMD check failures.
dbplyr 2.3.1
Breaking changes
window_order()now only accepts bare symbols or symbols wrapped indesc().
This breaking change is necessary to allowselect()to drop and rename
variables used inwindow_order()(@mgirlich, #1103).
Improved error messages
-
quantile()andmedian()now error for SQL Server when used insummarise()
and for PostgreSQL when used inmutate()as they can't be properly
translated (@mgirlich, #1110). -
Added an informative error for unsupported join arguments
unmatchedand
multiple(@mgirlich). -
Using predicates, e.g.
where(is.integer), inacross()now produces an
error as they never worked anyway (@mgirlich, #1169). -
Catch unsupported argument
pivot_wider(id_expand = TRUE)and
pivot_longer(cols_vary)(@mgirlich, #1109).
Bug fixes in SQL generation
-
Fixed an issue when using a window function after a
summarise()and
select()(@mgirlich, #1104). -
Fixed an issue when there where at least 3 joins and renamed variables
(@mgirlich, #1101). -
mutate()andselect()afterdistinct()now again produce a subquery to
generate the correct translation (@mgirlich, #1119, #1141). -
Fixed an issue when using
filter()on a summarised variable (@mgirlich, #1128). -
mutate()+filter()now again produces a new query if themutate()
uses a window function or SQL (@mgirlich, #1135). -
across()andpick()can be used (again) indistinct()(@mgirlich, #1125). -
The
rows_*()function work again for tables in a schema in PostgreSQL
(@mgirlich, #1133).
Minor improvements and bug fixes
-
sql()now evaluates its arguments locally also when used inacross()(@mgirlich, #1039). -
The rank functions (
row_number(),min_rank(),rank(),dense_rank(),
percent_rank(), andcume_dist()) now support multiple variables by
wrapping them intibble(), e.g.rank(tibble(x, y))(@mgirlich, #1118). -
Added support for
join_by()added in dplyr 1.1.0 (@mgirlich, #1074). -
Using
by = character()to perform a cross join is now soft-deprecated in
favor ofcross_join(). -
full_join()andright_join()are now translated directly toFULL JOIN
andRIGHT JOINfor SQLite as native support was finally added (@mgirlich, #1150). -
case_match()now works with strings on the left hand side (@mgirlich, #1143). -
The rank functions (
row_number(),min_rank(),rank(),dense_rank(),
percent_rank(), andcume_dist()) now work again for variables wrapped in
desc(), e.g.row_number(desc(x))(@mgirlich, #1118). -
Moved argument
auto_indexafter...in*_join()(@mgirlich, #1115). -
across()now uses the original value when a column is overriden to match
the behaviour of dplyr. For examplemutate(df, across(c(x, y), ~ .x / x))
now producesSELECT `x` / `x` AS `x`, `y` / `x` AS `y` FROM `df`instead of
SELECT `x`, `y` / `x` AS `y` FROM ( SELECT `x` / `x` AS `x`, `y` FROM `df` ) -
Restricted length of table aliases to avoid truncation on certain backends (e.g., Postgres) (@fh-mthomson, #1096)
dbplyr 2.3.0
New features
-
stringr::str_like()(new in 1.5.0) is translated to the closestLIKE
equivalent (@rjpat, #509) -
In preparation for dplyr 1.1.0:
- The
.byargument is supported (@mgirlich, #1051). - Passing
...toacross()is deprecated because the evaluation timing
of...is ambiguous. Now instead of (e.g.)
across(a:b, mean, na.rm = TRUE)use pick()is translated (@mgirlich, #1044).case_match()is translated (@mgirlich, #1020).case_when()now supports the.defaultargument (@mgirlich, #1017).
- The
-
Variables that aren't found in either the data or in the environment now
produce an error (@mgirlich, #907).
SQL optimisation
-
dbplyr now produces fewer subqueries resulting in shorter, more readable, and,
in some cases, faster SQL. The following combination of verbs now avoids a
subquery if possible: -
dbplyr now uses
SELECT *after a join instead of explicitly selecting every
column, where possible (@mgirlich, #898). -
Joins only use the table aliases ("LHS" and "RHS") if necessary (@mgirlich).
-
When using common table expressions, the results of joins and set operations
are now reused (@mgirlich, #978).
Improved error messages
-
Many errors have been improved and now show the function where the error
happened instead of a helper function (@mgirlich, #907). -
Errors produced by the database, e.g. in
collect()orrows_*(), now show
the verb where the error happened (@mgirlich). -
window_order()now produces a better error message when applied to a data
frame (@mgirlich, #947). -
Using a named
across()now gives a clear error message (@mgirlich, #761).
Minor improvements and bug fixes
-
Keyword highlighting can now be customised via the option
dbplyr_highlight.
Turn it off viaoptions(dbplyr_highlight = FALSE)or pass a custom ansi
style, e.g.options(dbplyr_highlight = cli::combine_ansi_styles("bold", "cyan"))
(@mgirlich, #974). -
The rank functions (
row_number(),min_rank(),rank(),dense_rank(),
percent_rank(), andcume_dist()) now give missing values the rank NA to
match the behaviour of dplyr (@mgirlich, #991). -
NAs inblob()s are correctly translated toNULL(#983). -
copy_inline()gains atypesargument to specify the SQL column types
(@mgirlich, #963). -
distinct()returns columns ordered the way you request, not the same
as the input data (@mgirlich). -
fill()can now fill "downup" and "updown" (@mgirlich, #1057), and
now order by non-numeric columns also in the up direction (@mgirlich, #1057). -
filter()now works when using a window function and an external vector
(#1048). -
group_by()+ renamed columns works once again (@mgirlich, #928). -
last()is correctly translated when no window frame is specified
(@mgirlich, #1063). -
setOldClass()uses a namespace, fixing an installation issue (@mgirlich, #927). -
sql()is now translated differently. The...are now evaluated locally
instead of being translated withtranslate_sql()(@mgirlich, #952).
Backend specific improvements
-
HANA:
-
MySQL:
str_flatten()usescollapse = ""by default (@fh-afrachioni, #993)
-
Oracle:
-
PostgreSQL:
- Generates correct literals for Dates (#727).
str_flatten()usescollapse = ""by default (@fh-afrachioni, #993)rows_*()use the column types ofxwhen auto copying (@mgirlich, #909).
-
Redshift:
round()now respects thedigitsargument (@owenjonesuob, #1033).- No longer tries to use named windows anymore (@owenjonesuob, #1035).
copy_inline()now works for Redshift (#949, thanks to @ejneer for an
initial implementation).str_flatten()usescollapse = ""by default (@fh-afrachioni, #993)
-
Snowflake:
-
numeric functions:
all(),any(),log10(),round(),cor(),cov()
andsd(). -
date functions:
day(),mday(),wday(),yday(),week(),
isoweek(),month(),quarter(),isoyear(),seconds(),minutes(),
hours(),days(),weeks(),months(),years()andfloor_date(). -
string functions:
grepl(),paste(),paste0(),str_c(),str_locate(),
str_detect(),str_replace(),str_replace_all(),str_remove(),
str_remove_all(),str_trim(),str_squish()andstr_flatten()
(@fh-afrachioni, #860). -
str_flatten()usescollapse = ""by default (@fh-afrachioni, #993) -
SQLite:
-
SQL server:
-
Teradata:
dbplyr 2.2.1
-
Querying Oracle databases works again. Unfortunately, the fix requires every
column to be explicitly selected again (@mgirlich, #908). -
semi_join()andanti_join()work again for Spark (@mgirlich, #915). -
str_c()is now translated to||in Oracle (@mgirlich, #921). -
sd(),var(),cor()andcov()now give clear error messages on
databases that don't support them. -
any()andall()gain default translations for all backends.
dbplyr 2.2.0
dbplyr 2.2.0
New features
-
SQL formatting has been considerably improved with new wrapping and indenting.
show_query()creates more readable queries by printing the keywords in blue
(@mgirlich, #644). When possible dbplyr now usesSELECT *instead of
explicitly selecting every column (@mgirlich). -
Added support for
rows_insert(),rows_append(),rows_update(),
rows_patch(),rows_upsert(), androws_delete()(@mgirlich, #736). -
Added
copy_inline()as acopy_to()equivalent that does not need write
access (@mgirlich, #628). -
remote_query(),show_query(),compute()andcollect()have an
experimentalcteargument. IfTRUEthe SQL query will use common table
expressions instead of nested queries (@mgirlich, #638). -
New
in_catalog(), which works likein_schema(), but allows creation of
table identifiers consisting of three components: catalog, schema, name
(#806, @krlmlr).
Improvements to SQL generation
-
When possible, dbplyr now uses
SELECT *instead of explicitly selecting
every column (@mgirlich). -
Improved translations for specific backends:
as.Date()for Oracle (@mgirlich, #661).case_when()with a final clause of the formTRUE ~ ...usesELSE ...
for SQLite (@mgirlich, #754).day(),week(),isoweek(), andisoyear()for Postgres (@mgirlich, #675).explain()for ROracle (@mgirlich).fill()for SQL Server (#651, @mgirlich) and RPostgreSQL (@mgirlich).quantile()for SQL Server (@mgirlich, #620).str_flatten()for Redshift (@hdplsa, #804)slice_sample()for MySQL/MariaDB and SQL Server (@mgirlich, #617).union()for Hive (@mgirlich, #663).
-
The backend function
dbplyr_fill0()(used for databases that lack
IGNORE NULLSsupport) now respects database specific translations
(@rsund, #753). -
Calls of the form
stringr::foo()orlubridate::foo()are now evaluated in
the database, rather than locally (#197). -
Unary plus (e.g.
db %>% filter(x == +1)) now works (@mgirlich, #674). -
is.na(),ifelse(),if_else(),case_when(), andif()
generate slightly more compact SQL (@mgirlich, #738). -
if_else()now supports themissingargument (@mgirlich, #641). -
quantile()no longer errors when using thena.rmargument (@mgirlich, #600). -
remote_name()now returns a name in more cases where it makes sense
(@mgirlich, #850). -
The partial evaluation code is now more aligned with
dtplyr. This makes it
easier to transfer bug fixes and new features from one package to the other.
In this process the second argument ofpartial_eval()was changed to a lazy
frame instead of a character vector of variables (@mgirlich, #766).
Partially evaluated expressions with infix operations are now correctly
translated. For exampletranslate_sql(!!expr(2 - 1) * x)now works
(@mgirlich, #634).
Minor improvements and bug fixes
-
New
pillar::tbl_format_header()method for lazy tables: Printing a lazy
table where all rows are displayed also shows the exact number of rows in the
header. The threshold is controlled bygetOption("pillar.print_min"),
with a default of 10 (#796, @krlmlr). -
The 1st edition extension mechanism is formally deprecated (#507).
-
across(),if_any()andif_all()now defaults to.cols = everything()
(@mgirlich, #760). If.fnsis not providedif_any()andif_all()work
like a parallel version ofany()/any()(@mgirlich, #734). -
across(),if_any(), andif_all()can now translate evaluated lists
and functions (@mgirlich, #796), and accept the name of a list of functions
(@mgirlich, #817). -
Multiple
across()calls inmutate()andtransmute()can now access
freshly created variables (@mgirlich, #802). -
add_count()now doesn't change the groups of the input (@mgirlich, #614). -
compute()can now handle whennameis named by unnaming it first
(@mgirlich, #623), and now works whentemporary = TRUEfor Oracle
(@mgirlich, #621). -
explain()passes...to methods (@mgirlich, #783), and
works for Redshift (@mgirlich, #740). -
filter()throws an error if you supply a named argument (@mgirlich, #764). -
Joins disambiguates columns that only differ in case (@mgirlich, #702).
New argumentsx_asandy_asallow you to control the table alias
used in SQL query (@mgirlich, #637). Joins withna_matches = "na"now work
for DuckDB (@mgirlich, #704). -
mutate()andtransmute()use named windows if a window definition is
used at least twice and the backend supports named windows (@mgirlich, #624). -
mutate()now supports the arguments.keep,.before, and.after
(@mgirlich, #802). -
na.rm = FALSEonly warns once every 8 hours across all functions (#899). -
nesting()now supports the.name_repairargument (@mgirlich, #654). -
pivot_longer()can now pivot a column namedname(@mgirlich, #692),
can repair names (@mgirlich, #694), and can work with multiplenames_from
columns (@mgirlich, #693). -
pivot_wider(values_fn = )andpivot_longer(values_transform = )
can now be formulas (@mgirlich, #745). -
pivot_wider()now supports the argumentsnames_vary,names_expand, and
unused_fn(@mgirlich, #774). -
remote_name()now returns a name in more cases where it makes sense
(@mgirlich, #850). -
sql_random()is now exported. -
ungroup()removes variables in...from grouping (@mgirlich, #689).