Skip to content

Error reading from Iceberg tables with very large timestamps in Parquet files #25837

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
vaultah opened this issue May 20, 2025 · 1 comment
Labels
iceberg Iceberg connector

Comments

@vaultah
Copy link

vaultah commented May 20, 2025

I have an Iceberg table with very large timestamps stored in a timestamptz column called ts.

Physically the values are stored in a Parquet file in an INT64 column with microsecond precision:

...
############ Column(ts) ############
name: ts
path: ts
max_definition_level: 0
max_repetition_level: 0
physical_type: INT64
logical_type: Timestamp(isAdjustedToUTC=true, timeUnit=microseconds, is_from_converted_type=false, force_set_converted_type=false)
converted_type (legacy): TIMESTAMP_MICROS
compression: ZSTD (space_saved: -29%)
...

When I try to run Trino queries that end up using the file, they fail with errors like

io.trino.spi.TrinoException: Failed to read Parquet file: <my parquet file>.parquet
    ...
Caused by: java.lang.IllegalArgumentException: Millis overflow: 6322461782400000
    at io.trino.spi.type.DateTimeEncoding.pack(DateTimeEncoding.java:30)
    at io.trino.spi.type.DateTimeEncoding.packDateTimeWithZone(DateTimeEncoding.java:52)
    ...

This is reproducible in Trino 454 and Trino 475, at least. I'm able to read from and write to this table from other query engines, like Spark.

The reason seems to be that Trino's DateTimeEncoding.pack packs the number of milliseconds into a Java long together with timezone information, and because the lower 12 bits are reserved to the timezone, that leaves 64 - 12 = 52 bits to the actual value. Thus, it looks like in Trino the de-facto supported range of TIMESTAMP WITH TIMEZONE is from - 2^51 + 1 to 2^51 - 1 milliseconds, or equivalently from -69387-04-22 03:45:14.753Z to +73326-09-11 20:14:45.247Z.

Is this correct? Can anything be done to improve handling of large timestamps in Iceberg tables, or at least the error message? Does Trino have a global limit on TIMESTAMP WITH TIMEZONE, and if so, can it be documented?

@ebyhr
Copy link
Member

ebyhr commented May 20, 2025

You are correct and I agree with documenting the range as I filed #10339 a few years ago.

@kumiDa kumiDa added the iceberg Iceberg connector label May 27, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
iceberg Iceberg connector
Development

No branches or pull requests

3 participants