-
Notifications
You must be signed in to change notification settings - Fork 185
Open
Description
Issue summary
When using pivot_wider() on a remote table (tbl_lazy) that contains negative values, setting values_fill = 0 unexpectedly replaces all negative values with 0, instead of preserving them. We observe such behavior with both RPostgres and RPresto backends (only two backends we tested).
Reprex
# Writes a df to database
df <- data.frame(x=c(1, 2, 3), y=c(1, 2, -10))
copy_to(conn, df, 'pivot_test_1', overwrite = T)
# pivot_wider on tbl_lazy
tbl(conn, 'pivot_test_1') %>% pivot_wider(names_from=x, values_from=y, values_fill=0)
# result
# Source: SQL [?? x 3]
# Database: PrestoConnection
# `2` `1` `3`
# <dbl> <dbl> <dbl>
#1 2 1 0The SQL it translates into is
SELECT
MAX(IF("x" = 2.0, "y", 0.0)) AS "2",
MAX(IF("x" = 1.0, "y", 0.0)) AS "1",
MAX(IF("x" = 3.0, "y", 0.0)) AS "3"
FROM "pivot_test_1"
LIMIT 21
Expected Behavior
Negative values should be preserved (because they are not NULL). In this example, the value in column 3 should be -10, not 0.
Note
- This does not occur with in-memory data frames — only with remote tables.
- In general, using
pivot_wider()withvalues_fillset andvalues_fnas default, any actual value that's smaller than values_fill will be replaced. - The issue appears to stem from how MAX(IF(...)) is used in the SQL translation. MAX() filters out the actual negative value when comparing it to the values_fill.
Metadata
Metadata
Assignees
Labels
No labels