Skip to content

From SQL to SPL:Complement a certain average value to ensure that the total sum remains unchanged

esProcSPL edited this page May 14, 2025 · 1 revision

An invoice table in the SQL Server database has one amount for each project, and each project in the project table has multiple accounts, and the two are associated through ProjectID.

Invoices

InvoiceID Amount ProjectID
1 100.0000 1
2 100.0000 2
3 100.0000 3
4 100.0000 4

#Projects

ID ProjectID AccountCode
1 1 12345
2 2 12345
3 2 7890
4 3 800
5 3 234
6 3 987
7 4 800
8 4 234
9 4 987
10 4 2579

Now we need to associate the two tables and add a SplitAmount field. Roughly on average divide the amount according to the number of accounts in the project, for example, 100 is divided into 3 parts. The amount of N-1 accounts should be rounded to 2 decimal places according to 1/N, which is 33.33. The Nth account should complement the average value to ensure that the total amount remains unchanged, which is 100-33.33 * 2=33.34.

InvoiceID Amount ProjectID AccountCode SplitAmount
1 100.0 1 12345 100.0
2 100.0 2 12345 50.0
2 100.0 2 7890 50.0
3 100.0 3 800 33.34
3 100.0 3 234 33.33
3 100.0 3 987 33.33
4 100.0 4 800 25.0
4 100.0 4 234 25.0
4 100.0 4 987 25.0
4 100.0 4 2579 25.0

SQL:

select *,
       SplitAmount 
       + case when rn = 1 
              then i.Amount - sum  (i.SplitAmount) 
                              over (partition by i.ProjectID)
              else 0
              end  as AdjustedSplitAmount
from(
  select 
      I.*, P.AccountCode,
      round(I.Amount / count(I.InvoiceID) over (partition by P.ProjectID), 2) as SplitAmount,
      row_number() over (partition by P.ProjectID order by p.AccountCode) as rn
  from 
      #Invoices I Inner Join #Projects P on I.ProjectID = P.ProjectID
) i

After SQL grouping, it must aggregate immediately, and cannot retain the grouped subsets and directly add SplitAmount field on the subsets according to the rules. It requires indirect implementation using nested subqueries and window functions, and the sequence numbers also need to be extra generated using window functions. The overall code is cumbersome

With grouped subsets, SPL code can be more natural: https://try.esproc.com/splx?2VD

A
1 $select I.InvoiceID InvoiceID, I.Amount Amount, I.ProjectID ProjectID, P.AccountCode AccountCode from Invoices.txt I , Projects.txt P where I.ProjectID = P.ProjectID order by P.ID
2 =A1.group(ProjectID)
3 =A2.(cnt=~.count(),avg=round(Amount/cnt,2), ~. derive(avg+if(#==1, Amount-avg*cnt): SplitAmount))
4 =A3.conj()

A1: Simple join, load data.

A2: Group, but not aggregate.

A3: Process each group of data and directly add SplitAmount field according to the rules. # is the natural sequence number, and there is no need for additional calculation.

A4: Merge groups.

Clone this wiki locally