-
Notifications
You must be signed in to change notification settings - Fork 45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Regression using Flink 1.20 relating to timestamp precision #143
Comments
Thanks for reporting the issue! Today I'll try to focus on this. Is there an easy workaround for that? What if you cast EDIT: I made my own example for better understanding. CREATE TABLE Orders (
user_id INTEGER,
order_time TIMESTAMP(6),
proc_time AS proctime()
) WITH (
'connector' = 'datagen',
'number-of-rows' = '5',
'rows-per-second' = '1'
);
CREATE TABLE `user_data` (
`user_id` INTEGER,
`date_from` TIMESTAMP(6), -- or TIMESTAMP(3)
`first_name` STRING,
`last_name` STRING
)
WITH (
'connector' = 'rest-lookup',
'url' = 'http://http-service:8000/dummy',
'format' = 'json',
'asyncPolling' = 'false',
'lookup-method' = 'GET'
);
SELECT *
FROM Orders o
LEFT JOIN user_data
FOR SYSTEM_TIME AS OF o.proc_time AS u
ON o.user_id = u.user_id AND o.order_time = u.date_from
; Then I run explain twice: (1) when
Important diff (
|
@davidradl I am afraid that none of the options mentioned is feasible. At least this is how I see it.
How can we extract information about other join condition? Unfortunately, public LookupRuntimeProvider getLookupRuntimeProvider(LookupContext lookupContext)
Could you please elaborate how we can influence query planner on how to decide which condition is a lookup condition?
You are right. It seems that |
@grzegorz8 yes I think the only real option would be implement the scan interface. At the moment we have a circumvention that we know what the users filters are from the request mapping in the config - this will be in the code I am currently working on for #99. |
Adding scan capability requires a lot of effort. It would be very nice feature, but it does not solve the issue. The planner can still decide to use lookup instead of scan, unless you specify join type via a hint. What works for me is to explicitly cast timestamp on the left side to the timestamp with precision defined in lookup table. In my example, the query should look like: SELECT *
FROM Orders o
LEFT JOIN user_data
FOR SYSTEM_TIME AS OF o.proc_time AS u
ON o.user_id = u.user_id AND CAST(o.order_time AS TIMESTAMP(3)) = u.date_from
; |
@grzegorz8 yes thanks for looking at this - we are also looking at aligning the precisions so this would come through as a lookup key. |
I have a test case
with the file contents:
I then create a view
and a lookup table
and so a lookup
At 1.19.1, the query that goes up has 2 query params
customerId
andparam_string_date_time
At 1.20, the query that goes up has 1 query param
customerId
This is caused by a flink PR which introduces logic to do compares at the maximum precision. The implication of this is that the table planner ends up seeing TIMESTAMP(6) and TIMESTAMP(9) as the 2 types in commonPhysicalLookupJoin analyzeLookupKeys. So it decides that this is not going to be a lookup key instead it treats it as a join condition.
Unfortunately the http connector currently is not looking for join conditions, so we lose this.
For JDBC connectors doing a look up, they support join conditions so the lookup join succeeds.
some thoughts / observations:
@grzegorz8 @kzarzycki WDYT?
The text was updated successfully, but these errors were encountered: