Skip to content

pivot_wider.tbl_lazy() translates wrongly with values_fill #1611

@willshen99

Description

@willshen99

Issue summary

When using pivot_wider() on a remote table (tbl_lazy) that contains negative values, setting values_fill = 0 unexpectedly replaces all negative values with 0, instead of preserving them. We observe such behavior with both RPostgres and RPresto backends (only two backends we tested).

Reprex

# Writes a df to database
df <- data.frame(x=c(1, 2, 3), y=c(1, 2, -10))
copy_to(conn, df, 'pivot_test_1', overwrite = T)
# pivot_wider on tbl_lazy
tbl(conn, 'pivot_test_1') %>% pivot_wider(names_from=x, values_from=y, values_fill=0)

# result
# Source:   SQL [?? x 3]
# Database: PrestoConnection
#     `2`   `1`   `3`
#   <dbl> <dbl> <dbl>
#1     2     1     0

The SQL it translates into is

SELECT
  MAX(IF("x" = 2.0, "y", 0.0)) AS "2",
  MAX(IF("x" = 1.0, "y", 0.0)) AS "1",
  MAX(IF("x" = 3.0, "y", 0.0)) AS "3"
FROM "pivot_test_1"
LIMIT 21

Expected Behavior

Negative values should be preserved (because they are not NULL). In this example, the value in column 3 should be -10, not 0.

Note

  • This does not occur with in-memory data frames — only with remote tables.
  • In general, using pivot_wider() with values_fill set and values_fn as default, any actual value that's smaller than values_fill will be replaced.
  • The issue appears to stem from how MAX(IF(...)) is used in the SQL translation. MAX() filters out the actual negative value when comparing it to the values_fill.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions