-
Notifications
You must be signed in to change notification settings - Fork 185
Description
I'm having a bear of a time trying to debug this issue that's arising when updating 2.2.1 -> 2.3.3. Genuinely hard to disentangle where the issue's coming from since there's so many layers where things may have gone wrong.
The query in the test being broken is pretty simple: inner_join() on some 3-row input tables:
personal_data <- tibble(
name = c("Alice", "Nick", "Bob"),
age = c(32, 25, 44)
)
employee_data <- tibble(
name = c("Alice", "Bob", "Michael"),
employee_id = c(1, 2, 3)
)
# Copy these tables to DB backend
tables <- list(personal_data, employee_data)
table_names <- paste0("r_tmp_", seq_along(tables))
db_conn <- withr::local_db_connection(MakeTestDBIConnection())
db_names <- lapply(
table_names,
\(table_name) DBI::Id(namespace = "datascape", table_name = table_name)
)
db_tables <- mapply(dplyr::copy_to,
df = tables, name = db_names,
MoreArgs = list(dest = db_conn, temporary = FALSE),
SIMPLIFY = FALSE
)
dplyr::inner_join(db_tables[[1]], db_tables[[2]], by = "name")This test (which compares this join's output to the local dplyr equivalent) works as expected on 2.2.1 but breaks on 2.3.3:
Error in `dplyr::collect(x)`: Failed to collect lazy table.
Caused by error in `doTryCatch()`:
! INVALID_ARGUMENT: SQL_ANALYSIS_ERROR: Syntax error: Expected end of input but got "." [at 2:46]
FROM `datascape`.`r_tmp_1` AS `\`datascape\``.`\`r_tmp_1\``
The issue is the already-escaped name `datascape`.`r_tmp_1` is re-escaped unsuccessfully.
Poking around in debugging I'm not able to tell what went wrong. It's possible our own connection methods are doing something unexpected, for example.
Just one observation:
Here, IIUC, we should respect the pre-escaped nature of the input when constructing by$x_as:
Lines 171 to 178 in 5fa4410
| op$joins$by <- purrr::map2( | |
| op$joins$by, seq_along(op$joins$by), | |
| function(by, i) { | |
| by$x_as <- table_names_out[op$joins$by_x_table_id[[i]]] | |
| by$y_as <- table_names_out[i + 1L] | |
| by | |
| } | |
| ) |
Debugging, I see this around that step:
dput(op$joins$by)
# list(list(
# x = structure("name", class = c("ident", "character")),
# y = structure("name", class = c("ident", "character")),
# condition = "==",
# on = structure(character(0), class = c("sql", "character")),
# na_matches = "never"
# ))
dput(table_names_out)
# c("`datascape`.`r_tmp_1`", "`datascape`.`r_tmp_2`")
dput(op$joins$by_x_table_id)
# list(1L)
# but also
dput(op$x$x)
# structure("`datascape`.`r_tmp_1`", class = c("ident_q", "ident", "character"))
dput(op$joins$table)
# list(structure("`datascape`.`r_tmp_2`", class = c("ident_q", "ident", "character")))Perhaps table_names_out should be ident_q at this step, but even if so, it would have the same result:
identical(
ident(table_names_out[1L]),
ident(ident_q(table_names_out[1L]))
)
# [1] TRUEShould ident() have an escape for "ident" input? And then we should make sure table_names_out reflects the same ident_q class as the input x$x and joins$table?
Maybe I'm barking up the wrong tree.