Skip to content

dbplyr 2.0.0

Choose a tag to compare

@hadley hadley released this 03 Nov 11:52

(To become dbplyr 2.0.0)

dplyr 1.0.0 compatibility

  • across() is now translated into individual SQL statements (#480).

  • rename() and select() support dplyr 1.0.0 tidyselect syntax (apart from
    predicate functions which can't easily work on computed queries) (#502).

  • relocate() makes it easy to move columns (#494) and rename_with() makes
    it easy to rename columns programmatically (#502).

  • slice_min(), slice_max(), and slice_order() are now supported.
    slice_head() and slice_tail() throw clear error messages (#394)

SQL generation

  • Documentation has been radically improved with new topics for each major
    verb and each backend giving more details about the SQL translation.

  • intersect(), union() and setdiff() gain an all argument to add the
    ALL argument (#414).

  • Join functions gains a na_matches argument that allows you to control
    whether or not NA (NULL) values match other NA values. The default is
    "never", which is the usual behaviour in databases. You can set
    na_matches = "na" to match R's usual join behaviour (#180). Additional
    arguments error (instead of being silently swallowed) (#382).

  • Joins now only use aliases where needed to disambiguate columns; this should
    make generated queries more readable.

  • Subqueries no longer include an ORDER BY clause. This is not part of the
    SQL spec, and has very limited support across databases. Now such queries
    generate a warning suggesting that you move your arrange() call later in
    the pipeline (#276). (There's one exception: ORDER BY is still generated
    if LIMIT is present; this tends to affect the returns rows but not
    necessarily their order).

  • Subquery names are now scoped within the query. This makes query text
    deterministic which helps some query optimisers/cachers (#336).

  • sql_optimise() now can partially optimise a pipeline; due to an unfortunate
    bug it previously gave up too easily.

  • in_schema() quotes each input individually (#287) (use sql() to opt out
    of quoting, if needed). And DBI::Id() should work anywhere that
    in_schema() does.

SQL translation

  • Experimental new SAP HANA backend (#233). Requires the latest version of odbc.

  • All backends:

    • You can now use :: in translations, so that (e.g.) dbplyr::n() is
      translated to count(*) (#207).

    • [[ can now also translate numeric indices (#520).

    • %/% now generates a clear error message; previously it was translated to
      / which is not correct (#108).

    • n() is translated to count(*) instead of count() (#343).

    • sub_str() translation is more consistent in edge cases (@ianmcook).

    • All median() (@lorenzwalthert, #483), pmin(), pmax() (#479), sd()
      and var() functions have an na.rm argument that warns once when not
      TRUE. This makes them consistent with mean() and sum().

    • substring() is now translated the same way as substr() (#378).

  • blob vectors can now be used with !! and
    !!! operators, for example in filter() (@okhoma, #433)

  • MySQL uses standard SQL for index creation.

  • MS SQL translation does better a distinguishing between bit and boolean
    (#377, #318). if and ifelse once again generate IIF, creating
    simpler expressions. as.*() function uses TRY_CAST() instead
    of CAST() for version 11+ (2012+) (@DavidPatShuiFong, #380).

  • odbc no longer translates count(); this was an accidental inclusion.

  • Oracle translation now depends on Oracle 12c, and uses a "row-limiting"
    clause for head(). It gains translations for today() and now(), and
    improved as.Date() translation (@rlh1994, #267).

  • PostgreSQL: new translations for lubridate period functions years(),
    months(), days(), and floor_date() (@bkkkk, #333) and stringr functions
    str_squish(), str_remove(), and str_remove_all() (@shosaco).

  • New RedShift translations when used with RPostgres::Redshift().

    • str_replace() errors since there's no Redshift translation,
      and str_replace_all() uses REGEXP_REPLACE() (#446).

    • paste() and paste0() use || (#458).

    • as.numeric() and as.double() cast to FLOAT (#408).

    • substr() and str_sub() use SUBSTRING() (#327).

  • SQLite gains translations for lubridate functions today(), now(),
    year(), month(), day(), hour(), minute(), second(),yday()
    (#262), and correct translation for median() (#357).

Extensibility

If you are the author of a dbplyr backend, please see vignette("backend-2") for details.

  • New dbplyr_edition() generic allows you to opt-in to the 2nd edition of
    the dbplyr API.

  • db_write_table() now calls DBI::dbWriteTable() instead of nine generics
    that formerly each did a small part: db_create_indexes(), db_begin(),
    db_rollback(), db_commit(), db_list_tables(), drop_drop_table(),
    db_has_table(), db_create_table(), and db_data_types(). You can
    now delete the methods for these generics.

    db_query_rows() is no longer used; it appears that it hasn't been used
    for some time, so if you have a method, you can delete it.

  • DBI::dbQuoteIdentifier() is now used instead of sql_escape_ident() and
    DBI::dbQuoteString() instead of sql_escape_string().

  • A number of db_* generics have been replaced with new SQL generation
    generics:

    • dplyr::db_analyze() -> dbplyr::sql_table_analyze()
    • dplyr::db_create_index() -> dbplyr::sql_table_index()
    • dplyr::db_explain() -> dbplyr::sql_queriy_explain()
    • dplyr::db_query_fields() -> dbplyr::sql_query_fields()
    • dplyr::db_save_query() -> dbplyr::sql_query_save()

    This makes them easier to test and is an important part of the process of
    moving all database generics in dbplyr (#284).

  • A number of other generics have been renamed to facilitate the move from
    dplyr to dbplyr:

    • dplyr::sql_select() -> dbplyr::sql_query_select()
    • dplyr::sql_join() -> dbplyr::sql_query_join()
    • dplyr::sql_semi_join() -> dbplyr::sql_query_semi_join()
    • dplyr::sql_set_op() -> dbplyr::sql_query_set_op()
    • dplyr::sql_subquery() -> dbplyr::sql_query_wrap()
    • dplyr::db_desc() -> dbplyr::db_connection_describe()
  • New db_temporary_table() generic makes it easier to work with databases
    that require temporary tables to be specially named.

  • New sql_expr_matches() generic allows databases to use more efficient
    alternatives when determine if two values "match" (i.e. like equality but
    a pair of NULLs will also match). For more details, see
    https://modern-sql.com/feature/is-distinct-from

  • New sql_join_suffix() allows backends to control the default suffixes
    used (#254).

Minor improvements and bug fixes

  • All old lazy eval shims have been removed. These have been deprecated for
    some time.

  • Date-time escaping methods for Athena and Presto have moved to the packages
    where they belong.

  • Attempting to embed a Shiny reactive in a query now gives a helpful error
    (#439).

  • copy_lahman() and copy_nycflights13() (and hence nycflights13_sqlite())
    and friends now return DBI connections rather than the now deprecated
    src_dbi() (#440).

  • copy_to() can now overwrite when table is specified with schema (#489),
    and gains an in_transaction argument used to optionally suppress the
    transaction wrapper (#368).

  • distinct() no longer duplicates column if grouped (#354).

  • transmute() now correctly tracks variables it needs when creating
    subqueries (#313).

  • mutate() grouping variables no longer generates a downstream error (#396)

  • mutate() correctly generates subqueries when you re-use the same variable
    three or more times (#412).

  • window_order() overrides ordering, rather than appending to it.