Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize queries for Core #111

Open
domoritz opened this issue Jan 1, 2019 · 0 comments
Open

Optimize queries for Core #111

domoritz opened this issue Jan 1, 2019 · 0 comments

Comments

@domoritz
Copy link
Member

domoritz commented Jan 1, 2019

By rewriting the filters, we could speed up queries significantly.

From @tmostak:

I took the original query: SELECT CASE WHEN airtime BETWEEN 0 AND 500 THEN floor( (airtime - cast(0 as float)) / cast(0.8333333333333334 as float) ) ELSE -1 END AS keyActive, count(*) AS cnt FROM flights WHERE (floor(cast(arrtime as float) / 100) + mod(arrtime, 100) / 60) BETWEEN 10.4 AND 14.16 GROUP BY keyActive, which on the test server like our internal one runs in about 170ms.

To avoid diving into the where clause (which I believe could be simplified to a simple between clause with constants before query generation), I just took this part of the query: SELECT CASE WHEN airtime BETWEEN 0 AND 500 THEN floor( (airtime - cast(0 as float)) / cast(0.8333333333333334 as float) ) ELSE -1 END AS keyActive, count(*) AS cnt FROM flights GROUP by keyActive, which still runs in about 170 ms warm. (Note that we do cache query patterns with hosted literals, and not the actual queries, so once you precompute an index on a dimension with a type, computing the index on another dimension with the same type should be “warm” for the system).

Floor seems to be a costly operation, so if you change the query to this, where we just use integer cast:

Select case when airtime between 0 and 500 then cast(airtime - 0.0 / .833333 as int) else -1 end as keyactive, count(*) as cnt from flights group by keyactive;

The warm runtime is suddenly 35-40 ms.

I’m not sure if you need the count outside the 0 to 500 range, but we can do smart things if we can push the filter down, which basically allows us to use a more compact perfect hash and also potentially shared memory on the GPU.

So with this query:

Select cast(airtime - 0.0 / .833333 as int) as keyactive, count(*) as cnt from flights where airtime between 0 and 500 group by keyactive;

We are suddenly at 15ms, or about 9% of the original.

If I add that original filter on airtime back in, like so:

Select cast(airtime - 0.0 / .833333 as int) as keyactive, count(*) as cnt from flights where airtime between 0 and 500 and (floor(cast(arrtime as float) / 100) + mod(arrtime, 100) / 60) BETWEEN 10.4 AND 14.16 group by keyactive;

The query time is now at 25ms.

But if I simplify that filter on arrtime to a between operation on constants (which could be isomorphic with the more complex filter you have), like this:

Select cast(airtime - 0.0 / .833333 as int) as keyactive, count(*) as cnt from flights where airtime between 0 and 500 and arrtime between 1040 and 1416 group by keyactive;

Then I am now at 14-15ms.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant