You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I took the original query: SELECT CASE WHEN airtime BETWEEN 0 AND 500 THEN floor( (airtime - cast(0 as float)) / cast(0.8333333333333334 as float) ) ELSE -1 END AS keyActive, count(*) AS cnt FROM flights WHERE (floor(cast(arrtime as float) / 100) + mod(arrtime, 100) / 60) BETWEEN 10.4 AND 14.16 GROUP BY keyActive, which on the test server like our internal one runs in about 170ms.
To avoid diving into the where clause (which I believe could be simplified to a simple between clause with constants before query generation), I just took this part of the query: SELECT CASE WHEN airtime BETWEEN 0 AND 500 THEN floor( (airtime - cast(0 as float)) / cast(0.8333333333333334 as float) ) ELSE -1 END AS keyActive, count(*) AS cnt FROM flights GROUP by keyActive, which still runs in about 170 ms warm. (Note that we do cache query patterns with hosted literals, and not the actual queries, so once you precompute an index on a dimension with a type, computing the index on another dimension with the same type should be “warm” for the system).
Floor seems to be a costly operation, so if you change the query to this, where we just use integer cast:
Select case when airtime between 0 and 500 then cast(airtime - 0.0 / .833333 as int) else -1 end as keyactive, count(*) as cnt from flights group by keyactive;
The warm runtime is suddenly 35-40 ms.
I’m not sure if you need the count outside the 0 to 500 range, but we can do smart things if we can push the filter down, which basically allows us to use a more compact perfect hash and also potentially shared memory on the GPU.
So with this query:
Select cast(airtime - 0.0 / .833333 as int) as keyactive, count(*) as cnt from flights where airtime between 0 and 500 group by keyactive;
We are suddenly at 15ms, or about 9% of the original.
If I add that original filter on airtime back in, like so:
Select cast(airtime - 0.0 / .833333 as int) as keyactive, count(*) as cnt from flights where airtime between 0 and 500 and (floor(cast(arrtime as float) / 100) + mod(arrtime, 100) / 60) BETWEEN 10.4 AND 14.16 group by keyactive;
The query time is now at 25ms.
But if I simplify that filter on arrtime to a between operation on constants (which could be isomorphic with the more complex filter you have), like this:
Select cast(airtime - 0.0 / .833333 as int) as keyactive, count(*) as cnt from flights where airtime between 0 and 500 and arrtime between 1040 and 1416 group by keyactive;
Then I am now at 14-15ms.
The text was updated successfully, but these errors were encountered:
By rewriting the filters, we could speed up queries significantly.
From @tmostak:
I took the original query: SELECT CASE WHEN airtime BETWEEN 0 AND 500 THEN floor( (airtime - cast(0 as float)) / cast(0.8333333333333334 as float) ) ELSE -1 END AS keyActive, count(*) AS cnt FROM flights WHERE (floor(cast(arrtime as float) / 100) + mod(arrtime, 100) / 60) BETWEEN 10.4 AND 14.16 GROUP BY keyActive, which on the test server like our internal one runs in about 170ms.
To avoid diving into the where clause (which I believe could be simplified to a simple between clause with constants before query generation), I just took this part of the query: SELECT CASE WHEN airtime BETWEEN 0 AND 500 THEN floor( (airtime - cast(0 as float)) / cast(0.8333333333333334 as float) ) ELSE -1 END AS keyActive, count(*) AS cnt FROM flights GROUP by keyActive, which still runs in about 170 ms warm. (Note that we do cache query patterns with hosted literals, and not the actual queries, so once you precompute an index on a dimension with a type, computing the index on another dimension with the same type should be “warm” for the system).
Floor seems to be a costly operation, so if you change the query to this, where we just use integer cast:
Select case when airtime between 0 and 500 then cast(airtime - 0.0 / .833333 as int) else -1 end as keyactive, count(*) as cnt from flights group by keyactive;
The warm runtime is suddenly 35-40 ms.
I’m not sure if you need the count outside the 0 to 500 range, but we can do smart things if we can push the filter down, which basically allows us to use a more compact perfect hash and also potentially shared memory on the GPU.
So with this query:
Select cast(airtime - 0.0 / .833333 as int) as keyactive, count(*) as cnt from flights where airtime between 0 and 500 group by keyactive;
We are suddenly at 15ms, or about 9% of the original.
If I add that original filter on airtime back in, like so:
Select cast(airtime - 0.0 / .833333 as int) as keyactive, count(*) as cnt from flights where airtime between 0 and 500 and (floor(cast(arrtime as float) / 100) + mod(arrtime, 100) / 60) BETWEEN 10.4 AND 14.16 group by keyactive;
The query time is now at 25ms.
But if I simplify that filter on arrtime to a between operation on constants (which could be isomorphic with the more complex filter you have), like this:
Select cast(airtime - 0.0 / .833333 as int) as keyactive, count(*) as cnt from flights where airtime between 0 and 500 and arrtime between 1040 and 1416 group by keyactive;
Then I am now at 14-15ms.
The text was updated successfully, but these errors were encountered: