-
Notifications
You must be signed in to change notification settings - Fork 349
Performance optimization case course:TPCH Q22
select
cntrycode,
count(*) as numcust,
sum(c_acctbal) as totacctbal
from (
select
substr(c_phone,1,2) as cntrycode,
c_acctbal
from
customer
where
substr(c_phone,1,2) in ('11', '14', '15', '19', '20', '21', '23')
and c_acctbal > (
select
avg(c_acctbal)
from
customer
where
c_acctbal > 0.00
and substr(c_phone,1,2) in ('11', '14', '15', '19', '20', '21', '23')
)
and not exists (
select
*
from
orders
where
o_custkey = c_custkey
)
) custsale
group by
cntrycode
order by
cntrycode;
The main query’s outer layer is a regular grouping and aggregation operation, and its inner layer contains two conditional subqueries.
The two subqueries, where the second one is preceded by not exists, correspond to the customer record sets that meet their respective condition. The final result is the difference set of the two sets.
There is no special requirement for data tables, store them in order by primary key.
We can directly use orders.ctx and customer.ctx from Q3. Copy them to the main directory of this query.
Calculation code:
A | |
---|---|
1 | =now() |
2 | =["11","14","15","19","20","21","23"] |
3 | =file("customer.ctx").open().cursor@m(C_CUSTKEY,C_PHONE,C_ACCTBAL;C_ACCTBAL>0.0 && A2.contain(left(C_PHONE,2))).fetch() |
4 | =A3.avg(C_ACCTBAL) |
5 | =A3.select@m(C_ACCTBAL>A4).derive@o().keys@im(C_CUSTKEY) |
6 | =file("orders.ctx").open().cursor@m(O_CUSTKEY;O_CUSTKEY:A5) |
7 | =A6.run(O_CUSTKEY.C_CUSTKEY=null).skip() |
8 | =A5.select@m(C_CUSTKEY) |
9 | =A8.groups(left(C_PHONE,2):cntrycode;count(1):numcust, sum(C_ACCTBAL):totacctbal) |
10 | =interval@ms(A1,now()) |
A5 is equivalent to selecting the records meeting condition specified in the first subquery. A6 associates with A5 and selects the records for the second subquery. A7 sets C_CUSTKEY values of these records selected in A6 as null, which means deleting them from A5 and is equivalent to achieving the not exists condition in the second subquery. A8 selects records satisfying the two conditional subqueries.
Test result:
Test items | Execution time (seconds) |
---|---|
General method | 6 |
For this query, we need to use the dimension table primary key sequence-numberization method mentioned in previous articles. Copy orders_5.ctx and customer_5.ctx in Q3 to the main directory of this query.
Calculation code:
A | B | |
---|---|---|
1 | =now() | |
2 | =["11","14","15","19","20","21","23"] | |
3 | =file("customer_5.ctx").open() | =A3.cursor@m().skip().(null) |
4 | =A3.cursor@m(C_CUSTKEY,C_ACCTBAL,C_PHONE;C_ACCTBAL>0.0 && A2.contain(left(C_PHONE,2))).fetch().@m(B3(C_CUSTKEY)=~) | |
5 | =B3.avg(C_ACCTBAL) | |
6 | =B3.@m(if(C_ACCTBAL>A5,~,null)) | |
7 | =file("orders_5.ctx").open().cursor@m(O_CUSTKEY;A6(O_CUSTKEY)).run(A6(O_CUSTKEY)=null).skip() | |
8 | =A6.select@m(~).groups@m(left(C_PHONE,2):cntrycode;count(1):numcust, sum(C_ACCTBAL):totacctbal) | |
9 | =interval@ms(A1,now()) |
B3 and A6 are aligned sequence, and their values are null or records of customer table.
Test result:
Test items | Execution time (seconds) |
---|---|
General method | 6 |
Data conversion | 2 |
Calculation code:
A | B | |
---|---|---|
1 | =now() | |
2 | =["11","14","15","19","20","21","23"] | |
3 | =file("customer_5.ctx").open() | =A3.cursor@m().skip().(null) |
4 | =A3.cursor@mv(C_CUSTKEY,C_ACCTBAL,C_PHONE;C_ACCTBAL>0.0 && A2.contain(left(C_PHONE,2))).fetch().@m(B3(C_CUSTKEY)=~) | |
5 | =B3.avg(C_ACCTBAL) | |
6 | =B3.@m(if(C_ACCTBAL>A5,~,null)) | |
7 | =file("orders_5.ctx").open().cursor@mv(O_CUSTKEY;A6(O_CUSTKEY)).run(A6(O_CUSTKEY)=null).skip() | |
8 | =A6.select(~).groups@m(left(C_PHONE,2):cntrycode;count(1):numcust, sum(C_ACCTBAL):totacctbal) | |
9 | =interval@ms(A1,now()) |
Test result:
Test items | Execution time (seconds) |
---|---|
General method | 6 |
Data conversion | 2 |
Column-wise computing | 1 |
SPL Resource: SPL Official Website | SPL Blog | Download esProc SPL | SPL Source Code