-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Expose execution time zone to UDFs #16573
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Previously the field was optional. However, when set to a None value, it would trigger invalid behavior, with `timestamp with time zone` type being parsed as if it was just `timestamp` type.
The execution time zone serves as the session zone from SQL semantics perspective. In particular, it influences how a CAST form `timestamp` to `timestamp with time zone` is performed. This is, however, applied at query parser stage. The execution/session time zone should be also available to functions in both invoke and simplify APIs.
7a3c90e
to
378bf35
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -133,7 +133,7 @@ fn test_evaluate_with_start_time( | |||
date_time: &DateTime<Utc>, | |||
) { | |||
let execution_props = | |||
ExecutionProps::new().with_query_execution_start_time(*date_time); | |||
ExecutionProps::new().with_query_execution_start_time(*date_time, "UTC".into()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this uses "UTC"
but the default timezone in session config is "+0:00"
Is this discrepancy intended?
UTC is used in other tests as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i thought that "+0:00" and "UTC" is the same thing
@@ -61,15 +65,18 @@ impl ExecutionProps { | |||
pub fn with_query_execution_start_time( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is strange to have a function named with_query_execution_start_time
that sets both start time and timezone
Would it be possible to make a second separate function for setting the timezone?
) -> Self { | ||
self.query_execution_start_time = query_execution_start_time; | ||
self.query_execution_time_zone = query_execution_time_zone; | ||
self | ||
} | ||
|
||
/// Marks the execution of query started timestamp. | ||
/// This also instantiates a new alias generator. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the comment could also use some updating
@@ -311,6 +311,8 @@ pub struct ScalarFunctionArgs { | |||
/// or `return_field_from_args`) when creating the physical expression | |||
/// from the logical expression | |||
pub return_field: FieldRef, | |||
/// The configured execution time zone (a.k.a. session time zone) | |||
pub execution_time_zone: String, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we add a String
here it will require another allocation and copy on each function call for each batch
Would it be possible to make this Arc<str>
(more similar to a Java string which is refcounted and so less expensive to pass around?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I though that zones are short strings and short strings are cheap to copy. while Arc clone involves atomic CAS (which i believe is unlike in Java).
I don't have numbers to base this on, and may be totally wrong, but i read the title of the following page https://blog.031410.xyz/blog/arc_str_vs_string_is_it_really_faster
If we care about perf here, then even cheaper would be to pass &str
here. Requires adding a lifetime to ScalarFunctionArgs (breaking change), and that's likely the only downside.
I'll be honest. I think it's too limited. For example, you still wouldn't have access to any custom config, var providers, etc. |
I definitely don't need that, maybe because I don't know what good var providers can do for me. Do we need to know all of the desirable items from the start, or just anticipate we gonna evolve set of what's accessible as the need arises?
wdyt? |
It looks we didn't reach a conclusion here yet. |
So it is still my opinion that copying fields from For that reason, I still think figuring out some way to pass Also as @Omega359 using ConfigOptions aligns us with the asyn udfs as well. I was wondering if we can change We could use Let me see if I can whip up a demo |
That was definitely my intention -- expose only those fields or state attributes that are relevant. Keep the API surface as small as only sufficient. If that's controversial, or problematic, then sure we can expose all the config options like in #16661 |
The execution time zone serves as the session zone from SQL semantics
perspective. In particular, it influences how a CAST form
timestamp
totimestamp with time zone
is performed. This is, however, applied atquery parser stage. The execution/session time zone should be also
available to functions in both invoke and simplify APIs.
relates to
SessionConfig
reference toScalarFunctionArgs
#13519datafusion.execution.time_zone
is not used for basic time zone inference #13212