-
Notifications
You must be signed in to change notification settings - Fork 349
From SQL to SPL:Complement a certain average value to ensure that the total sum remains unchanged
An invoice table in the SQL Server database has one amount for each project, and each project in the project table has multiple accounts, and the two are associated through ProjectID.
InvoiceID | Amount | ProjectID |
---|---|---|
1 | 100.0000 | 1 |
2 | 100.0000 | 2 |
3 | 100.0000 | 3 |
4 | 100.0000 | 4 |
#Projects
ID | ProjectID | AccountCode |
---|---|---|
1 | 1 | 12345 |
2 | 2 | 12345 |
3 | 2 | 7890 |
4 | 3 | 800 |
5 | 3 | 234 |
6 | 3 | 987 |
7 | 4 | 800 |
8 | 4 | 234 |
9 | 4 | 987 |
10 | 4 | 2579 |
Now we need to associate the two tables and add a SplitAmount field. Roughly on average divide the amount according to the number of accounts in the project, for example, 100 is divided into 3 parts. The amount of N-1 accounts should be rounded to 2 decimal places according to 1/N, which is 33.33. The Nth account should complement the average value to ensure that the total amount remains unchanged, which is 100-33.33 * 2=33.34.
InvoiceID | Amount | ProjectID | AccountCode | SplitAmount |
---|---|---|---|---|
1 | 100.0 | 1 | 12345 | 100.0 |
2 | 100.0 | 2 | 12345 | 50.0 |
2 | 100.0 | 2 | 7890 | 50.0 |
3 | 100.0 | 3 | 800 | 33.34 |
3 | 100.0 | 3 | 234 | 33.33 |
3 | 100.0 | 3 | 987 | 33.33 |
4 | 100.0 | 4 | 800 | 25.0 |
4 | 100.0 | 4 | 234 | 25.0 |
4 | 100.0 | 4 | 987 | 25.0 |
4 | 100.0 | 4 | 2579 | 25.0 |
SQL:
select *,
SplitAmount
+ case when rn = 1
then i.Amount - sum (i.SplitAmount)
over (partition by i.ProjectID)
else 0
end as AdjustedSplitAmount
from(
select
I.*, P.AccountCode,
round(I.Amount / count(I.InvoiceID) over (partition by P.ProjectID), 2) as SplitAmount,
row_number() over (partition by P.ProjectID order by p.AccountCode) as rn
from
#Invoices I Inner Join #Projects P on I.ProjectID = P.ProjectID
) i
After SQL grouping, it must aggregate immediately, and cannot retain the grouped subsets and directly add SplitAmount field on the subsets according to the rules. It requires indirect implementation using nested subqueries and window functions, and the sequence numbers also need to be extra generated using window functions. The overall code is cumbersome
With grouped subsets, SPL code can be more natural: https://try.esproc.com/splx?2VD
A | |
---|---|
1 | $select I.InvoiceID InvoiceID, I.Amount Amount, I.ProjectID ProjectID, P.AccountCode AccountCode from Invoices.txt I , Projects.txt P where I.ProjectID = P.ProjectID order by P.ID |
2 | =A1.group(ProjectID) |
3 | =A2.(cnt=~.count(),avg=round(Amount/cnt,2), ~. derive(avg+if(#==1, Amount-avg*cnt): SplitAmount)) |
4 | =A3.conj() |
A1: Simple join, load data.
A2: Group, but not aggregate.
A3: Process each group of data and directly add SplitAmount field according to the rules. # is the natural sequence number, and there is no need for additional calculation.
A4: Merge groups.
SPL Resource: SPL Official Website | SPL Blog | Download esProc SPL | SPL Source Code