-
-
Notifications
You must be signed in to change notification settings - Fork 188
Description
Describe the issue:
The way timedelta values (a.k.a. durations, intervals...) are stored in parquet does not follow the file format specification. According to the parquet specification, the logical type Interval
should be stored as:
INTERVAL is used for an interval of time. It must annotate a fixed_len_byte_array of length 12. This array stores three little-endian unsigned integers that represent durations at different granularities of time. The first stores a number in months, the second stores a number in days, and the third stores a number in milliseconds. This representation is independent of any particular timezone or date.
(...)
Currently, fastparquet
does not follow the format specification on this type. This affects the ability to read parquets written with other tools or to read with other tools parquets written with fastparquet
if there is any field with this type.
I guess it might be a known issue rather than a bug, but I couldn't find info about it.
Minimal Complete Verifiable Example:
import pandas as pd
from fastparquet import write
df = pd.DataFrame([{'seconds': 30, 'duration': pd.to_timedelta(30, unit='seconds')}])
write('/test/test.parquet', df)
Then use either hangxie/parquet-tools, ktrueda/parquet-tools or any similar tool to inspect the schema to find that it looks like:
{"Tag":"name=Schema",
"Fields":[
{"Tag":"name=Seconds, type=INT64, repetitiontype=OPTIONAL"},
{"Tag":"name=Duration, type=INT64, convertedtype=TIME_MICROS, repetitiontype=OPTIONAL"}
]}
instead of something along the lines of
{"Tag":"name=Duckdb_schema",
"Fields":[
{"Tag":"name=Seconds, type=INT32, convertedtype=INT_32, repetitiontype=OPTIONAL"},
{"Tag":"name=Duration, type=FIXED_LEN_BYTE_ARRAY, convertedtype=INTERVAL, length=12, repetitiontype=OPTIONAL"}
]}
Anything else we need to know?:
There's a bit more context on this StackOverflow question
Environment:
- Pandas version: 2.2.2
- Python version: 2024.5.0
- Operating System: macOS 14.6.1
- Install method (conda, pip, source): pip