Releases: tidyverse/dbplyr
dbplyr 2.1.1
-
New support for Snowflake (@edgararuiz)
-
compute(),sql_table_index(), andsql_query_wrap()now work with
schemas (@mgirlich, #595). -
if_any()andif_all()are now translated. -
group_by()now ungroups when the dots argument is empty and.addisFALSE
(@mgirlich, #615). -
sql_escape_date()andsql_escape_datetimegain methods for MS Access
(@erikvona, #608).
dbplyr 2.1.0
New features
-
Thanks to @mgirlich, dbplyr gains support for key verbs from tidyr:
pivot_longer()(#532),pivot_wider()(#543),expand()(#538),
complete()(#538),replace_na()(#538),fill()(#566). -
@mgirlich is now a dbplyr author in recognition of his significant and
sustained contributions. -
across()implementation has been rewritten to support more inputs:
it now translates formulas (#525), works with SQL functions that don't have
R translations (#534), and work withNULL(#554) -
summarise()now supports argument.groups(@mgirlich, #584).
SQL translation
-
All backends:
str_sub(),substr()andsubstring()get better
translations (#577). Most importantly, the results of using negative
locations should match the underlying R implementations more closely. -
MS SQL:
-
as.integer()andas.integer64()translations cast first toNUMERIC
to avoid CASTing weirdness (@DavidPatShuiFong, #496). -
Assumes a boolean context inside of
[(#546) -
str_sub()withend = -1now works (#577).
-
-
Redshift:
lag()andlead()lose thedefaultparameter since it's
not supported (@hdplsa, #548). -
SQLite: custom translation of
full_join()andright_join()
(@mgirlich, #536).
Minor improvements and bug fixes
-
RPostgreSQL backend warns if
temporary = TRUEsince temporary tables are
not supported byRPostgres::dbWriteTable()(#574). -
count()method provides closer match to dplyr semantics (#347). -
db_connection_describe()no longer uses partial matching (@mgirlich, #564). -
pull()no longerselect()s the result when there's already only
one variable (#562). -
select()no longer relocates grouping variables to the front
(@mgirlich, #568). and informs when adding missing grouping variables
(@mgirlich, #559). -
tbl.src_dbi(...)now passed on totbl_sql()(#530).
dbplyr 2.0.0
(To become dbplyr 2.0.0)
dplyr 1.0.0 compatibility
-
across()is now translated into individual SQL statements (#480). -
rename()andselect()support dplyr 1.0.0 tidyselect syntax (apart from
predicate functions which can't easily work on computed queries) (#502). -
relocate()makes it easy to move columns (#494) andrename_with()makes
it easy to rename columns programmatically (#502). -
slice_min(),slice_max(), andslice_order()are now supported.
slice_head()andslice_tail()throw clear error messages (#394)
SQL generation
-
Documentation has been radically improved with new topics for each major
verb and each backend giving more details about the SQL translation. -
intersect(),union()andsetdiff()gain anallargument to add the
ALLargument (#414). -
Join functions gains a
na_matchesargument that allows you to control
whether or notNA(NULL) values match otherNAvalues. The default is
"never", which is the usual behaviour in databases. You can set
na_matches = "na"to match R's usual join behaviour (#180). Additional
arguments error (instead of being silently swallowed) (#382). -
Joins now only use aliases where needed to disambiguate columns; this should
make generated queries more readable. -
Subqueries no longer include an
ORDER BYclause. This is not part of the
SQL spec, and has very limited support across databases. Now such queries
generate a warning suggesting that you move yourarrange()call later in
the pipeline (#276). (There's one exception:ORDER BYis still generated
ifLIMITis present; this tends to affect the returns rows but not
necessarily their order). -
Subquery names are now scoped within the query. This makes query text
deterministic which helps some query optimisers/cachers (#336). -
sql_optimise()now can partially optimise a pipeline; due to an unfortunate
bug it previously gave up too easily. -
in_schema()quotes each input individually (#287) (usesql()to opt out
of quoting, if needed). AndDBI::Id()should work anywhere that
in_schema()does.
SQL translation
-
Experimental new SAP HANA backend (#233). Requires the latest version of odbc.
-
All backends:
-
You can now use
::in translations, so that (e.g.)dbplyr::n()is
translated tocount(*)(#207). -
[[can now also translate numeric indices (#520). -
%/%now generates a clear error message; previously it was translated to
/which is not correct (#108). -
n()is translated tocount(*)instead ofcount()(#343). -
sub_str()translation is more consistent in edge cases (@ianmcook). -
All
median()(@lorenzwalthert, #483),pmin(),pmax()(#479),sd()
andvar()functions have anna.rmargument that warns once when not
TRUE. This makes them consistent withmean()andsum(). -
substring()is now translated the same way assubstr()(#378).
-
-
blob vectors can now be used with
!!and
!!!operators, for example infilter()(@okhoma, #433) -
MySQL uses standard SQL for index creation.
-
MS SQL translation does better a distinguishing between bit and boolean
(#377, #318).ifandifelseonce again generateIIF, creating
simpler expressions.as.*()function usesTRY_CAST()instead
ofCAST()for version 11+ (2012+) (@DavidPatShuiFong, #380). -
odbc no longer translates
count(); this was an accidental inclusion. -
Oracle translation now depends on Oracle 12c, and uses a "row-limiting"
clause forhead(). It gains translations fortoday()andnow(), and
improvedas.Date()translation (@rlh1994, #267). -
PostgreSQL: new translations for lubridate period functions
years(),
months(),days(), andfloor_date()(@bkkkk, #333) and stringr functions
str_squish(),str_remove(), andstr_remove_all()(@shosaco). -
New RedShift translations when used with
RPostgres::Redshift(). -
SQLite gains translations for lubridate functions
today(),now(),
year(),month(),day(),hour(),minute(),second(),yday()
(#262), and correct translation formedian()(#357).
Extensibility
If you are the author of a dbplyr backend, please see vignette("backend-2") for details.
-
New
dbplyr_edition()generic allows you to opt-in to the 2nd edition of
the dbplyr API. -
db_write_table()now callsDBI::dbWriteTable()instead of nine generics
that formerly each did a small part:db_create_indexes(),db_begin(),
db_rollback(),db_commit(),db_list_tables(),drop_drop_table(),
db_has_table(),db_create_table(), anddb_data_types(). You can
now delete the methods for these generics.db_query_rows()is no longer used; it appears that it hasn't been used
for some time, so if you have a method, you can delete it. -
DBI::dbQuoteIdentifier()is now used instead ofsql_escape_ident()and
DBI::dbQuoteString()instead ofsql_escape_string(). -
A number of
db_*generics have been replaced with new SQL generation
generics:dplyr::db_analyze()->dbplyr::sql_table_analyze()dplyr::db_create_index()->dbplyr::sql_table_index()dplyr::db_explain()->dbplyr::sql_queriy_explain()dplyr::db_query_fields()->dbplyr::sql_query_fields()dplyr::db_save_query()->dbplyr::sql_query_save()
This makes them easier to test and is an important part of the process of
moving all database generics in dbplyr (#284). -
A number of other generics have been renamed to facilitate the move from
dplyr to dbplyr:dplyr::sql_select()->dbplyr::sql_query_select()dplyr::sql_join()->dbplyr::sql_query_join()dplyr::sql_semi_join()->dbplyr::sql_query_semi_join()dplyr::sql_set_op()->dbplyr::sql_query_set_op()dplyr::sql_subquery()->dbplyr::sql_query_wrap()dplyr::db_desc()->dbplyr::db_connection_describe()
-
New
db_temporary_table()generic makes it easier to work with databases
that require temporary tables to be specially named. -
New
sql_expr_matches()generic allows databases to use more efficient
alternatives when determine if two values "match" (i.e. like equality but
a pair ofNULLs will also match). For more details, see
https://modern-sql.com/feature/is-distinct-from -
New
sql_join_suffix()allows backends to control the default suffixes
used (#254).
Minor improvements and bug fixes
-
All old lazy eval shims have been removed. These have been deprecated for
some time. -
Date-time escaping methods for Athena and Presto have moved to the packages
where they belong. -
Attempting to embed a Shiny reactive in a query now gives a helpful error
(#439). -
copy_lahman()andcopy_nycflights13()(and hencenycflights13_sqlite())
and friends now return DBI connections rather than the now deprecated
src_dbi()(#440). -
copy_to()can nowoverwritewhen table is specified with schema (#489),
and gains anin_transactionargument used to optionally suppress the
transaction wrapper (#368). -
distinct()no longer duplicates column if grouped (#354). -
transmute()now correctly tracks variables it needs when creating
subqueries (#313). -
mutate()grouping variables no longer generates a downstream error (#396) -
mutate()correctly generates subqueries when you re-use the same variable
three or more times (#412). -
window_order()overrides ordering, rather than appending to it.
dbplyr 1.4.4
-
Internally
DBI::dbExecute()now usesimmediate = TRUE; this improves
support for session-scoped temporary tables in MS SQL (@krlmlr, #438). -
Subqueries with
ORDER BYuseTOP 9223372036854775807instead of
TOP 100 PERCENTon SQL Server for compatibility with Azure Data Warehouse
(#337, @alexkyllo). -
escape()now supportsblobvectors using newsql_escape_raw()
generic. It enables using blob variables in
dplyr verbs, for example to filter nvarchar values by UTF-16 blobs
(see r-dbi/DBI#215 (comment)).
(@okhoma, #433) -
Added
setOldClass()calls for"ident"and"ident_q"classes for
compatibility with dplyr 1.0.0 (#448, @krlmlr). -
Postgres
str_detect()translation uses same argument names as stringr,
and gains anegateargument (#444). -
semi_join()andanti_join()now correctly support thesql_onargument
(#443, @krlmlr).
dbplyr 1.4.3
-
dbplyr now uses RPostgres (instead of RPostgreSQL) and RMariaDB (instead of
RMySQL) for its internal tests and data functions (#427). -
The Date and POSIXt methods for
escape()now use exported
sql_escape_date()andsql_escape_datetime()generics to allow backend
specific formatting of date and datetime literals. These are used to
provide methods for Athena and Presto backends (@OssiLehtinen, #384, #391). -
first(),last(),nth(),lead()andlag()now respect the
window_frame()(@krlmlr, #366). -
SQL server: new translations for
str_flatten()(@PauloJhonny, #405). -
SQL server: temporary datasets are now session-local, not global (#401).
-
Postgres: correct
str_detect(),str_replace()andstr_replace_all()
translation (@shosaco, #362).
dbplyr 1.4.2
-
Fix bug when partially evaluating unquoting quosure containing a single
symbol (#317) -
Fixes for rlang and dpylr compatibility.
dbplyr 1.4.1
Minor improvements to SQL generation
dbplyr 1.4.0
Breaking changes
-
Error: `con` must not be NULL: If you see this error, it probably means
that you have probably forgotten to passcondown to a dbplyr function.
Previously, dbplyr defaulted to usingsimulate_dbi()which introduced
subtle escaping bugs. (It's also possible I have forgotten to pass it
somewhere that the dbplyr tests don't pick up, so if you can't figure it
out, please let me know). -
Subsetting (
[[,$, and[) functions are no longer evaluated locally.
This makes the translation more consistent and enables useful new idioms
for modern databases (#200).
New features
-
MySQL/MariaDB (https://mariadb.com/kb/en/library/window-functions/)
and SQLite (https://www.sqlite.org/windowfunctions.html) translations gain
support for window functions, available in Maria DB 10.2, MySQL 8.0, and
SQLite 3.25 (#191). -
Overall, dplyr generates many fewer subqueries:
-
Joins and semi-joins no longer add an unneeded subquery (#236). This is
faciliated by the newbare_identifier_okargument tosql_render();
the previous argument was calledrootand confused me. -
Many sequences of
select(),rename(),mutate(), andtransmute()can
be collapsed into a single query, instead of always generating a subquery
(#213).
-
-
New
vignette("sql")describes some advantages of dbplyr over SQL (#205) and
gives some advice about writing how to write literal SQL inside of dplyr,
when you you need to (#196). -
New
vignette("reprex")gives some hints on creating reprexes that work
anywhere (#117). This is supposrted by a newtbl_memdb()that
matches the existingtbl_lazy(). -
All
..._join()functions gain ansql_onargument that allows specifying
arbitrary join predicates in SQL code (#146, @krlmlr).
SQL translations
-
New translations for some lubridate functions:
today(),now(),
year(),month(),day(),hour(),minute(),
second(),quarter(), ``yday()(@colearendt, @derekmorr). Also added new translation foras.POSIXct()`. -
New translations for stringr functions:
str_c(),str_sub(),
str_length(),str_to_upper(),str_to_lower(), andstr_to_title()
(@colearendt). Non-translated stringr functions throw a clear error. -
New translations for bitwise operations:
bitwNot(),bitwAnd(),bitwOr(),
bitwXor(),bitwShiftL(), andbitwShiftR(). Unlike the base R functions,
the translations do not coerce arguments to integers (@davidchall, #235). -
New translation for
x[y]toCASE WHEN y THEN x END. This enables
sum(a[b == 0])to work as you expect from R (#202).yneeds to be
a logical expression; if not you will likely get a type error from your
database. -
New translations for
x$yandx[["y"]]tox.y, enabling you to index
into nested fields in databases that provide them (#158). -
The
.dataand.envpronouns of tidy evaluation are correctly translated
(#132). -
New translation for
median()andquantile(). Works for all ANSI compliant
databases (SQL Server, Postgres, MariaDB, Teradata) and has custom
translations for Hive. Thanks to @edavidaja for researching the SQL variants!
(#169) -
na_if()is correct translated toNULLIF()(rather thanNULL_IF) (#211). -
n_distinct()translation throws an error when given more than one argument.
(#101, #133). -
New default translations for
paste(),paste0(), and the hyperbolic
functions (these previously were only available for ODBC databases). -
Corrected translations of
pmin()andpmax()toLEAST()andGREATEST()
for ANSI compliant databases (#118), toMIN()andMAX()for SQLite, and
to an error for SQL server. -
New translation for
switch()to the simple form ofCASE WHEN(#192).
SQL simulation
SQL simulation makes it possible to see what dbplyr will translate SQL to, without having an active database connection, and is used for testing and generating reprexes.
-
SQL simulation has been overhauled. It now works reliably, is better
documented, and always uses ANSI escaping (i.e.`for field
names and'for strings). -
tbl_lazy()now actually puts adbplyr::srcin the$srcfield. This
shouldn't affect any downstream code unless you were previously working
around this weird difference betweentbl_lazyandtbl_sqlclasses.
It also includes thesrcclass in its class, and when printed,
shows the generated SQL (#111).
Database specific improvements
-
MySQL/MariaDB
-
Translations also applied to connections via the odbc package
(@colearendt, #238) -
Basic support for regular expressions via
str_detect()and
str_replace_all()(@colearendt, #168). -
Improved translation for
as.logical(x)toIF(x, TRUE, FALSE).
-
-
Oracle
-
Postgres
- Basic support for regular expressions via
str_detect()and
str_replace_all()(@colearendt, #168).
- Basic support for regular expressions via
-
SQLite
explain()translation now generatesEXPLAIN QUERY PLANwhich
generates a higher-level, more human friendly explanation.
-
SQL server
-
Improved translation for
as.logical(x)toCAST(x as BIT)(#250). -
Translates
paste(),paste0(), andstr_c()to+. -
copy_to()method applies temporary table name transformation
earlier so that you can now overwrite temporary tables (#258). -
db_write_table()method uses correct argument name for
passing along field types (#251).
-
Minor improvements and bug fixes
-
Aggregation functions only warn once per session about the use of
na.rm = TRUE(#216). -
table names generated by
random_table_name()have the prefix
"dbplyr_", which makes it easier to find them programmatically
(@mattle24, #111) -
Functions that are only available in a windowed (
mutate()) query now
throw an error when called in a aggregate (summarise()) query (#129) -
arrange()understands the.by_groupargument, making it possible
sort by groups if desired. The default isFALSE(#115) -
distinct()now handles computed variables likedistinct(df, y = x + y)
(#154). -
escape(),sql_expr()andbuild_sql()no longer acceptcon = NULLas
a shortcut forcon = simulate_dbi(). This made it too easy to forget to
passconalong, introducing extremely subtle escaping bugs.win_over()
gains aconargument for the same reason. -
New
escape_ansi()always uses ANSI SQL 92 standard escaping (for use
in examples and documentation). -
mutate(df, x = NULL)dropsxfrom the output, just like when working with
local data frames (#194). -
partial_eval()processes inlined functions (including rlang lambda
functions). This makes dbplyr work with more forms of scoped verbs like
df %>% summarise_all(~ mean(.)),df %>% summarise_all(list(mean))(#134). -
sql_aggregate()now takes an optional argumentf_rfor passing to
check_na_rm(). This allows the warning to show the R function name rather
than the SQL function name (@sverchkov, #153). -
sql_infix()gains apadargument for the rare operator that doesn't
need to be surrounded by spaces. -
sql_prefix()no longer turns SQL functions into uppercase, allowing for
correct translation of case-sensitive SQL functions (#181, @mtoto). -
summarise()gives a clear error message if you refer to a variable
created in that samesummarise()(#114). -
New
sql_call2()which is torlang::call2()assql_expr()is to
rlang::expr(). -
show_query()andexplain()usecat()rather than message. -
union(),union_all(),setdiff()andintersect()do a better job
of matching columns across backends (#183).
dbplyr 1.3.0
- Now support for dplyr 0.8.0 (#190) and R 3.1.0
API changes
-
Calls of the form
dplyr::foo()are now evaluated in the database,
rather than locally (#197). -
varsargument totbl_sql()has been formally deprecated; it hasn't
actually done anything for a while (#3254). -
srcandtblobjects now include a class generated from the class of
the underlying connection object. This makes it possible for dplyr backends
to implement different behaviour at the dplyr level, when needed. (#2293)
SQL translation
-
x %in% yis now translated toFALSEifyis empty (@mgirlich, #160). -
New
as.integer64(x)translation toCAST(x AS BIGINT)(#3305) -
case_whennow translates with a ELSE clause if a formula of the form
TRUE~<RHS>is provided . (@cderv, #112) -
cummean()now generatesAVG()notMEAN()(#157) -
str_detect()now uses correct parameter order (#3397) -
MS SQL
-
Oracle
- Custom
db_drop_table()now only drops tables if they exist (#3306) - Custom
setdiff()translation (#3493) - Custom
db_explain()translation (#3471)
- Custom
-
SQLite
- Correct translation for
as.numeric()/as.double()(@chris-park, #171).
- Correct translation for
-
Redshift
substr()translation improved (#3339)
Minor improvements and bug fixes
-
copy_to()will only remove existing table whenoverwrite = TRUEand the
table already exists, eliminating a confusing "NOTICE" from PostgreSQL
(#3197). -
partial_eval()handles unevaluated formulas (#184). -
pull.tbl_sql()now extracts correctly from grouped tables (#3562). -
sql_render.op()now correctly forwards theconargument (@kevinykuo, #73).
dbplyr 1.2.1
- Forward compatibility fixes for rlang 0.2.0