-
Notifications
You must be signed in to change notification settings - Fork 200
Description
Background
We are using the built-in support of Trino to access existing Hive views. Under the hood, Trino uses Coral to perform this conversion. Syntactically, most queries work out perfectly fine. But now we encountered a case, where the behaviour of the same SQL expression differs between Trino and Hive/Impala/Spark.
Problem
Hive/Impala support a wider range of timestamp formats in a CAST(STRING AS TIMESTAMP) expression. Specifically, the following two CAST operations work in Hive/Impala/Spark, but not in Trino:
- SELECT CAST('2020-08-27T13:58:02.483Z' AS TIMESTAMP);
- SELECT CAST('2020-08-27T13:58:02.483' AS TIMESTAMP);
The problem is the "T" in between the date and the time. The following two queries work as expected:
- SELECT CAST('2020-08-27 13:58:02.483Z' AS TIMESTAMP);
- SELECT CAST('2020-08-27 13:58:02.483' AS TIMESTAMP);
Unfortunately, we have both datetime formats in our data and therefore cannot simply pick one of them.
Moreover, a failing conversion results in a query failure in Trino, while Hive/Impala/Spark will simply produce NULL values (quite debatable which behaviour is preferred in the context of a Hive to Trino conversion).
Suggested Solution
I was experimenting with different expressions in Trino for implementing a conversion, which supports both value types, and came up with the following expression:
- SELECT COALESCE(TRY(CAST('2020-08-27T13:58:02.483Z' AS TIMESTAMP)), CAST(TRY(from_iso8601_timestamp('2020-08-27T13:58:02.483Z')) AS TIMESTAMP WITHOUT TIME ZONE));
Of course, the real SQL wouldn't use fixed strings, but a column from the source table instead. This expression provides a compatible behaviour in the sense that:
- It supports two different time formats (with and without a "T")
- It doesn't fail the query on failed conversions
Suggested Implementation
I know that this is a very special corner case, and I was wondering where this expression replacement should be implemented. I would not suggest to implement it in the Hive to Coral translation, and I also would not suggest to implement it in the Coral to Trino translation. Instead, I propose to implement that very special rule in the wrapper Hive to Trino translation class by implementing an additional translation pass.
If you agree on this approach, I am optimistic that I can provide a corresponding pull-request. But I first want to discuss the issue with you to avoid going into a wrong direction.