Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(snowflake)!: Transpile BQ's TIMESTAMP() function #4309

Merged
merged 1 commit into from
Oct 30, 2024

Conversation

VaggelisD
Copy link
Collaborator

BigQuery supports TIMESTAMP(<expr>[, zone]) function for TIMESTAMP construction. Today, this is parsed in BQ and transpiled to other dialects (DuckDB, Presto/Trino) through the no_timestamp_sql helper which generates either a cast (1-arg version) or an AT TIME ZONE expression if a zone is present (2-arg version).

However, the 2-arg version transpilation does not seem correct; BigQuery's TIMESTAMPs are always stored at UTC which is not the case for other dialects. In main branch, these are the transpilations & their results today:

bigquery> SELECT TIMESTAMP('2008-12-25 15:30:00', 'America/Los_Angeles')
2008-12-25 23:30:00 UTC

presto> SELECT AT_TIMEZONE(CAST('2008-12-25 15:30:00' AS TIMESTAMP), 'America/Los_Angeles');
2008-12-25 07:30:00.000 America/Los_Angeles

duckdb> SELECT CAST('2008-12-25 15:30:00' AS TIMESTAMP) AT TIME ZONE 'America/Los_Angeles';
2008-12-26 01:30:00+02

This PR:

  • Fixes this by changing exp.AtTimeZone to exp.ConvertTimezone, which will first evaluate the timestamp at the user-inputted zone and then convert it to UTC
  • Adds no_timestamp_sql to Snowflake to enable the transpilation path

The new transpilations and their results on this branch are the following:

duckdb> SELECT CAST('2008-12-25 15:30:00' AS TIMESTAMP) AT TIME ZONE 'America/Los_Angeles' AT TIME ZONE 'UTC';
2008-12-25 23:30:00

snowflake> SELECT CONVERT_TIMEZONE('America/Los_Angeles', 'UTC', CAST('2008-12-25 15:30:00' AS TIMESTAMP));
2008-12-25 23:30:00.000

 presto> SELECT AT_TIMEZONE(AT_TIMEZONE(CAST('2008-12-25 15:30:00' AS TIMESTAMP), 'America/Los_Angeles'), 'UTC');
2008-12-25 15:30:00.000 UTC

Note that although DuckDB & Snowflake now match BQ's results, Presto/Trino are still off; The reason for that is that the first AT_TIMEZONE does a <session zone> -> <zone> conversion instead of pushing the zone into the timestamp which would allow for the UTC conversion to work.

Docs

BigQuery TIMESTAMP | Snowflake CONVERT_TIMEZONE

@georgesittas
Copy link
Collaborator

georgesittas commented Oct 29, 2024

Regarding the first examples, I think Presto may be the only problematic one:

presto> SELECT AT_TIMEZONE(CAST('2008-12-25 15:30:00' AS TIMESTAMP), 'America/Los_Angeles');
2008-12-25 07:30:00.000 America/Los_Angeles

duckdb> SELECT CAST('2008-12-25 15:30:00' AS TIMESTAMP) AT TIME ZONE 'America/Los_Angeles';
2008-12-26 01:30:00+02

Presto seems to give us back a timestamp @ noon UTC time, which doesn't match the original BigQuery semantics. DuckDB, on the other hand, gives back the correct timestamp: 2008-12-25 23:30:00 UTC.

So, given that the Presto results are still off after this PR, due to the session time conversion you mentioned, I'm curious what's the intent behind this fix? Perhaps the existing logic is fine for other targets.

What about the transpiled Snowflake SQL in main today, is it correct? Or, if not, would it be fixed if we simply used no_timestamp_sql in it?

@VaggelisD
Copy link
Collaborator Author

So this PR is really split between 2 parts:

  1. Wiring no_timestamp_sql to Snowflake as the transpilation is not supported at all today; Without any further changes, this would be the result right away (exp.AtTimeZone is converted to the 2-arg CONVERT_TIMEZONE(...)):
SELECT CONVERT_TIMEZONE('America/Los_Angeles', CAST('2008-12-25 15:30:00' AS TIMESTAMP));
2008-12-25 15:30:00.000 -0800
  1. Changing exp.AtTimeZone to exp.ConvertTimezone in order to bring both DuckDB & Snowflake results to UTC-based time, i.e 23:30:00. The solution on main today gives a TIMESTAMP that is shifted to the engine's session zone (e.g +2 for DuckDB and -8 for Snowflake). However, my reservation had to do with the fact that in BigQuery, timestamps technically don't have a timezone, so if the zone part is stripped from the DuckDB/Snowflake results (e.g. cast to TIMESTAMPNTZ) we'd end up with a non-UTC time (01:30:00 for DuckDB and 15:30:00 for Snowflake).

What do you think?

@georgesittas
Copy link
Collaborator

It seems like (1) should suffice? 2008-12-25 15:30:00.000 -0800 is equivalent to 2008-12-25 23:30:00 UTC, same for the DuckDB one with the +2 offset.

However, my reservation had to do with the fact that in BigQuery, timestamps technically don't have a timezone, so if the zone part is stripped from the DuckDB/Snowflake results (e.g. cast to TIMESTAMPNTZ) we'd end up with a non-UTC time (01:30:00 for DuckDB and 15:30:00 for Snowflake).

Why would the zone parts be stripped from the DuckDB/Snowflake results, though? What's an example where this can happen at transpilation time, without further modifications of the result from the user?

@VaggelisD
Copy link
Collaborator Author

Why would the zone parts be stripped from the DuckDB/Snowflake results, though? What's an example where this can happen at transpilation time, without further modifications of the result from the user?

Don't have a concrete example at hand, mostly going of the hunch that this is a slightly better representation for BQ's TIMESTAMP; Since it represents a single point in time, aligning with that exact value requires no further arguing.

Will revert these changes for the time being to unblock the rest of the PR.

@VaggelisD VaggelisD force-pushed the vaggelisd/sf_timestamp branch from 25f7df8 to e5e5137 Compare October 30, 2024 10:02
@VaggelisD VaggelisD merged commit 2af4936 into main Oct 30, 2024
6 checks passed
@VaggelisD VaggelisD deleted the vaggelisd/sf_timestamp branch October 30, 2024 10:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants