Skip to content

dbplyr 1.4.0

Choose a tag to compare

@hadley hadley released this 29 Apr 19:54

Breaking changes

  • Error: `con` must not be NULL: If you see this error, it probably means
    that you have probably forgotten to pass con down to a dbplyr function.
    Previously, dbplyr defaulted to using simulate_dbi() which introduced
    subtle escaping bugs. (It's also possible I have forgotten to pass it
    somewhere that the dbplyr tests don't pick up, so if you can't figure it
    out, please let me know).

  • Subsetting ([[, $, and [) functions are no longer evaluated locally.
    This makes the translation more consistent and enables useful new idioms
    for modern databases (#200).

New features

  • MySQL/MariaDB (https://mariadb.com/kb/en/library/window-functions/)
    and SQLite (https://www.sqlite.org/windowfunctions.html) translations gain
    support for window functions, available in Maria DB 10.2, MySQL 8.0, and
    SQLite 3.25 (#191).

  • Overall, dplyr generates many fewer subqueries:

    • Joins and semi-joins no longer add an unneeded subquery (#236). This is
      faciliated by the new bare_identifier_ok argument to sql_render();
      the previous argument was called root and confused me.

    • Many sequences of select(), rename(), mutate(), and transmute() can
      be collapsed into a single query, instead of always generating a subquery
      (#213).

  • New vignette("sql") describes some advantages of dbplyr over SQL (#205) and
    gives some advice about writing how to write literal SQL inside of dplyr,
    when you you need to (#196).

  • New vignette("reprex") gives some hints on creating reprexes that work
    anywhere (#117). This is supposrted by a new tbl_memdb() that
    matches the existing tbl_lazy().

  • All ..._join() functions gain an sql_on argument that allows specifying
    arbitrary join predicates in SQL code (#146, @krlmlr).

SQL translations

  • New translations for some lubridate functions: today(), now(),
    year(), month(), day(), hour(), minute(),
    second(), quarter(), ``yday()(@colearendt, @derekmorr). Also added new translation foras.POSIXct()`.

  • New translations for stringr functions: str_c(), str_sub(),
    str_length(), str_to_upper(), str_to_lower(), and str_to_title()
    (@colearendt). Non-translated stringr functions throw a clear error.

  • New translations for bitwise operations: bitwNot(), bitwAnd(), bitwOr(),
    bitwXor(), bitwShiftL(), and bitwShiftR(). Unlike the base R functions,
    the translations do not coerce arguments to integers (@davidchall, #235).

  • New translation for x[y] to CASE WHEN y THEN x END. This enables
    sum(a[b == 0]) to work as you expect from R (#202). y needs to be
    a logical expression; if not you will likely get a type error from your
    database.

  • New translations for x$y and x[["y"]] to x.y, enabling you to index
    into nested fields in databases that provide them (#158).

  • The .data and .env pronouns of tidy evaluation are correctly translated
    (#132).

  • New translation for median() and quantile(). Works for all ANSI compliant
    databases (SQL Server, Postgres, MariaDB, Teradata) and has custom
    translations for Hive. Thanks to @edavidaja for researching the SQL variants!
    (#169)

  • na_if() is correct translated to NULLIF() (rather than NULL_IF) (#211).

  • n_distinct() translation throws an error when given more than one argument.
    (#101, #133).

  • New default translations for paste(), paste0(), and the hyperbolic
    functions (these previously were only available for ODBC databases).

  • Corrected translations of pmin() and pmax() to LEAST() and GREATEST()
    for ANSI compliant databases (#118), to MIN() and MAX() for SQLite, and
    to an error for SQL server.

  • New translation for switch() to the simple form of CASE WHEN (#192).

SQL simulation

SQL simulation makes it possible to see what dbplyr will translate SQL to, without having an active database connection, and is used for testing and generating reprexes.

  • SQL simulation has been overhauled. It now works reliably, is better
    documented, and always uses ANSI escaping (i.e. ` for field
    names and ' for strings).

  • tbl_lazy() now actually puts a dbplyr::src in the $src field. This
    shouldn't affect any downstream code unless you were previously working
    around this weird difference between tbl_lazy and tbl_sql classes.
    It also includes the src class in its class, and when printed,
    shows the generated SQL (#111).

Database specific improvements

  • MySQL/MariaDB

    • Translations also applied to connections via the odbc package
      (@colearendt, #238)

    • Basic support for regular expressions via str_detect() and
      str_replace_all() (@colearendt, #168).

    • Improved translation for as.logical(x) to IF(x, TRUE, FALSE).

  • Oracle

    • New custom translation for paste() and paste0() (@cderv, #221)
  • Postgres

    • Basic support for regular expressions via str_detect() and
      str_replace_all() (@colearendt, #168).
  • SQLite

    • explain() translation now generates EXPLAIN QUERY PLAN which
      generates a higher-level, more human friendly explanation.
  • SQL server

    • Improved translation for as.logical(x) to CAST(x as BIT) (#250).

    • Translates paste(), paste0(), and str_c() to +.

    • copy_to() method applies temporary table name transformation
      earlier so that you can now overwrite temporary tables (#258).

    • db_write_table() method uses correct argument name for
      passing along field types (#251).

Minor improvements and bug fixes

  • Aggregation functions only warn once per session about the use of
    na.rm = TRUE (#216).

  • table names generated by random_table_name() have the prefix
    "dbplyr_", which makes it easier to find them programmatically
    (@mattle24, #111)

  • Functions that are only available in a windowed (mutate()) query now
    throw an error when called in a aggregate (summarise()) query (#129)

  • arrange() understands the .by_group argument, making it possible
    sort by groups if desired. The default is FALSE (#115)

  • distinct() now handles computed variables like distinct(df, y = x + y)
    (#154).

  • escape(), sql_expr() and build_sql() no longer accept con = NULL as
    a shortcut for con = simulate_dbi(). This made it too easy to forget to
    pass con along, introducing extremely subtle escaping bugs. win_over()
    gains a con argument for the same reason.

  • New escape_ansi() always uses ANSI SQL 92 standard escaping (for use
    in examples and documentation).

  • mutate(df, x = NULL) drops x from the output, just like when working with
    local data frames (#194).

  • partial_eval() processes inlined functions (including rlang lambda
    functions). This makes dbplyr work with more forms of scoped verbs like
    df %>% summarise_all(~ mean(.)), df %>% summarise_all(list(mean)) (#134).

  • sql_aggregate() now takes an optional argument f_r for passing to
    check_na_rm(). This allows the warning to show the R function name rather
    than the SQL function name (@sverchkov, #153).

  • sql_infix() gains a pad argument for the rare operator that doesn't
    need to be surrounded by spaces.

  • sql_prefix() no longer turns SQL functions into uppercase, allowing for
    correct translation of case-sensitive SQL functions (#181, @mtoto).

  • summarise() gives a clear error message if you refer to a variable
    created in that same summarise() (#114).

  • New sql_call2() which is to rlang::call2() as sql_expr() is to
    rlang::expr().

  • show_query() and explain() use cat() rather than message.

  • union(), union_all(), setdiff() and intersect() do a better job
    of matching columns across backends (#183).