-
Notifications
You must be signed in to change notification settings - Fork 41
Description
I am trying to write a data.frame with a factor column to a ducklake
, but I am getting an error - perhaps the ENUM type is not (yet?) supported?
When I write a data.frame frame to a regular duckdb
database, the factor column is stored as an ENUM type.
library(duckdb)
con <- dbConnect(duckdb())
# regular duckdb table
df <- data.frame(
letters = factor(letters[1:4])
)
dbWriteTable(con, "test", df, overwrite = TRUE)
dbGetQuery(con, "DESCRIBE test;") # letters field is type ENUM
dbDisconnect(con)
But when I try to write the same data.frame to a ducklake
, the operation fails
# ducklake
con <- dbConnect(duckdb())
dbExecute(
con,
"INSTALL ducklake;
ATTACH 'ducklake:metadata.ducklake' AS my_ducklake (DATA_PATH 'data_files');
USE my_ducklake;")
dbWriteTable(con, "demo", df)
and I get the following error:
Error in `dbSendQuery()`:
! rapi_prepare: Failed to prepare query CREATE TABLE demo AS SELECT #1 FROM _duckdb_write_view_psccaemnxs
Error: Invalid Input Error: Failed to convert DuckDB type to DuckLake - unsupported type ENUM('a', 'b', 'c', 'd')
Run `rlang::last_trace()` to see where the error occurred.
I suspect that ducklake
might simply not support the ENUM type, yet, as per its documentation. But I think parquet files offer a dictionary encoding, and - if I remember correctly - the arrow
R package coerces R factors into dictionaries. Is there a way to do that in the context of a ducklake
? Or maybe that's on the roadmap?
Many thanks for developing these awesome tools for the R community!