Skip to content

Can R factors be written into a ducklake? #1249

@tomsing1

Description

@tomsing1

I am trying to write a data.frame with a factor column to a ducklake, but I am getting an error - perhaps the ENUM type is not (yet?) supported?

When I write a data.frame frame to a regular duckdb database, the factor column is stored as an ENUM type.

library(duckdb)
con <- dbConnect(duckdb())

# regular duckdb table
df <- data.frame(
  letters = factor(letters[1:4])
)
dbWriteTable(con, "test", df, overwrite = TRUE)
dbGetQuery(con, "DESCRIBE test;")  # letters field is type ENUM
dbDisconnect(con)

But when I try to write the same data.frame to a ducklake, the operation fails

# ducklake
con <- dbConnect(duckdb())

dbExecute(
  con,
  "INSTALL ducklake;
  ATTACH 'ducklake:metadata.ducklake' AS my_ducklake (DATA_PATH 'data_files');
  USE my_ducklake;")
dbWriteTable(con, "demo", df)

and I get the following error:

Error in `dbSendQuery()`:
 ! rapi_prepare: Failed to prepare query CREATE  TABLE demo AS SELECT #1 FROM _duckdb_write_view_psccaemnxs
Error: Invalid Input Error: Failed to convert DuckDB type to DuckLake - unsupported type ENUM('a', 'b', 'c', 'd')
Run `rlang::last_trace()` to see where the error occurred.

I suspect that ducklake might simply not support the ENUM type, yet, as per its documentation. But I think parquet files offer a dictionary encoding, and - if I remember correctly - the arrow R package coerces R factors into dictionaries. Is there a way to do that in the context of a ducklake? Or maybe that's on the roadmap?

Many thanks for developing these awesome tools for the R community!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions