Skip to content

Improve CAST(STRING AS TIMESTAMP) in Hive to Trino conversion. #543

@kupferk

Description

@kupferk

Background

We are using the built-in support of Trino to access existing Hive views. Under the hood, Trino uses Coral to perform this conversion. Syntactically, most queries work out perfectly fine. But now we encountered a case, where the behaviour of the same SQL expression differs between Trino and Hive/Impala/Spark.

Problem

Hive/Impala support a wider range of timestamp formats in a CAST(STRING AS TIMESTAMP) expression. Specifically, the following two CAST operations work in Hive/Impala/Spark, but not in Trino:

  • SELECT CAST('2020-08-27T13:58:02.483Z' AS TIMESTAMP);
  • SELECT CAST('2020-08-27T13:58:02.483' AS TIMESTAMP);

The problem is the "T" in between the date and the time. The following two queries work as expected:

  • SELECT CAST('2020-08-27 13:58:02.483Z' AS TIMESTAMP);
  • SELECT CAST('2020-08-27 13:58:02.483' AS TIMESTAMP);

Unfortunately, we have both datetime formats in our data and therefore cannot simply pick one of them.

Moreover, a failing conversion results in a query failure in Trino, while Hive/Impala/Spark will simply produce NULL values (quite debatable which behaviour is preferred in the context of a Hive to Trino conversion).

Suggested Solution

I was experimenting with different expressions in Trino for implementing a conversion, which supports both value types, and came up with the following expression:

  • SELECT COALESCE(TRY(CAST('2020-08-27T13:58:02.483Z' AS TIMESTAMP)), CAST(TRY(from_iso8601_timestamp('2020-08-27T13:58:02.483Z')) AS TIMESTAMP WITHOUT TIME ZONE));

Of course, the real SQL wouldn't use fixed strings, but a column from the source table instead. This expression provides a compatible behaviour in the sense that:

  • It supports two different time formats (with and without a "T")
  • It doesn't fail the query on failed conversions

Suggested Implementation

I know that this is a very special corner case, and I was wondering where this expression replacement should be implemented. I would not suggest to implement it in the Hive to Coral translation, and I also would not suggest to implement it in the Coral to Trino translation. Instead, I propose to implement that very special rule in the wrapper Hive to Trino translation class by implementing an additional translation pass.

If you agree on this approach, I am optimistic that I can provide a corresponding pull-request. But I first want to discuss the issue with you to avoid going into a wrong direction.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions