-
Notifications
You must be signed in to change notification settings - Fork 182
dbplyr_uncount()
on Redshift tables fails with an opaque error
#1601
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I should add for transparency that I'm actually using an AWS Redshift database; and from some further research, I think there's a slight difference between Postgres and Redshift when it comes to lowecase conversion:
In other words, Redshift coerces names to lowercase in results, even if they are quoted in the query - unless a certain database configuration value has been changed from the default, in which case it behaves identically to Postgres (as far as I can tell). I'm getting a bit lost in the weeds trying to figure out whether/where names get quoted during {dbplyr}'s query building process, but assuming they are quoted, that might narrow this issue down to just a Redshift issue, rather than Postgres too. |
dbplyr_uncount()
on Postgres/Redshift tables fails with an opaque errordbplyr_uncount()
on Redshift tables fails with an opaque error
Uh oh!
There was an error while loading. Please reload this page.
Description
When using
dbplyr_uncount()
on a table within aPostgres orAWS Redshift database, we always fail with an error which mentions theseq2()
function. It's not immediately obvious why this happens.Minimal example
The first few lines will need to be adjusted to create a valid connection/table.
Created on 2025-04-01 with reprex v2.1.1
Session info
This example uses a constant
weights
value, but we see the same error if we pass a column name forweights
.Cause
Postgres converts all (unquoted) column names to lowercase.Redshift converts all column names in results, always, to lowercase. Within the query that's executed in order to calculaten_max
, this conversion leads to a name mismatch, resulting in an unexpectedNULL
result.Details
Early on in
dbplyr::dbplyr_uncount()
, a query is constructed calculate the max number of repetitions, and is then executed implicitly viadplyr::pull()
:dbplyr/R/verb-uncount.R
Line 50 in 5726930
Note that the argument provided to the
summarise()
query isn't named, so it's given a name using the expression itself (I think viaset_names()
?) - which looks like:The call is to
pull()
is dispatched todbplyr::pull.tbl_sql()
, which:Derives the name of the variable to extract:
dbplyr/R/verb-pull.R
Line 20 in 5726930
As we just saw, that variable is called something like
max(..., na.rm = TRUE)
.Collects query results from the database:
dbplyr/R/verb-pull.R
Line 29 in 5726930
However,
Postgres converts all (unquoted) column names to lowercaseRedshift converts all column names, always, in results to lowercase:This means that the collected results contain a column called something like
max(..., na.rm = true)
.Uses the name to extract the appropriate column from the collected results:
dbplyr/R/verb-pull.R
Line 30 in 5726930
Due to the mismatch in names, we end up with
out <- NULL
So we end up with
n_max <- NULL
, hence the error in theseq2()
call a little later:dbplyr/R/verb-uncount.R
Line 63 in 5726930
Output from `debug()`
The text was updated successfully, but these errors were encountered: