Description
Hello!
I'm seeing errors when reading a delta table that contains nested timestamp fields, with duckdb 1.2.2. This does not happen in 1.2.1.
D SELECT * FROM delta_scan('/tmp/ts_out');
Not implemented Error:
Unsupported type cast detected in Delta table 'STRUCT(inner_ts TIMESTAMP)' -> 'STRUCT(inner_ts TIMESTAMP WITH TIME ZONE)'. DuckDB currently does not support column mapping for nested types.
The delta table has this schema:
{
"type": "struct",
"fields": [
{
"name": "outer_ts",
"type": "timestamp",
"nullable": false,
"metadata": {}
},
{
"name": "entity",
"type": {
"type": "struct",
"fields": [
{
"name": "inner_ts",
"type": "timestamp",
"nullable": true,
"metadata": {}
}
]
},
"nullable": false,
"metadata": {}
}
]
}
Specifically, I have a struct with a timestamp in it.
The delta timestamp
type here is analogous to the TIMESTAMP WITH TIME ZONE
type in duckdb. And if I describe the table that's what I (correctly) see:
D create view ts as select * from delta_scan('/tmp/ts_out');
D describe ts;
┌─────────────┬───────────────────────────────────────────┬─────────┬─────────┬─────────┬─────────┐
│ column_name │ column_type │ null │ key │ default │ extra │
│ varchar │ varchar │ varchar │ varchar │ varchar │ varchar │
├─────────────┼───────────────────────────────────────────┼─────────┼─────────┼─────────┼─────────┤
│ outer_ts │ TIMESTAMP WITH TIME ZONE │ YES │ NULL │ NULL │ NULL │
│ entity │ STRUCT(inner_ts TIMESTAMP WITH TIME ZONE) │ YES │ NULL │ NULL │ NULL │
└─────────────┴───────────────────────────────────────────┴─────────┴─────────┴─────────┴─────────┘
However, during the scan of the delta's data, it seems to be treating the raw data as a duckdb TIMESTAMP
, which is analogous to a delta timestamp_ntz
. It then tries to cast the TIMESTAMP to TIMESTAMP WITH TIME ZONE, which fails because it's a nested field.
If I describe/read the parquet files directly, I get a duckdb timestamp type:
D create view ts_p as select * from read_parquet('/tmp/ts_out/*parquet');
D describe ts_p;
┌─────────────┬────────────────────────────┬─────────┬─────────┬─────────┬─────────┐
│ column_name │ column_type │ null │ key │ default │ extra │
│ varchar │ varchar │ varchar │ varchar │ varchar │ varchar │
├─────────────┼────────────────────────────┼─────────┼─────────┼─────────┼─────────┤
│ outer_ts │ TIMESTAMP │ YES │ NULL │ NULL │ NULL │
│ entity │ STRUCT(inner_ts TIMESTAMP) │ YES │ NULL │ NULL │ NULL │
└─────────────┴────────────────────────────┴─────────┴─────────┴─────────┴─────────┘
On the surface, it seems like there may be mix up with the timestamp type between reading delta table schema vs reading the parquet files?
Thanks for looking!