Skip to content

From SQL to SPL:Conditional grouping

esProcSPL edited this page Jun 19, 2025 · 1 revision

A certain database table has multiple fields that can be grouped.

ID SPLIT CUST DATE AMOUNT
ID_1 SPLIT_YES A 2024-05-01 00:00:00 100
ID_1 SPLIT_NO A 2024-04-01 00:00:00 200
ID_1 SPLIT_YES B 2024-03-01 00:00:00 50
ID_2 SPLIT_YES A 2024-05-01 00:00:00 50
ID_2 SPLIT_NO A 2024-04-01 00:00:00 300
ID_2 SPLIT_NO B 2024-03-01 00:00:00 300
ID_3 SPLIT_YES B 2024-04-01 00:00:00 90
ID_3 SPLIT_NO B 2024-04-01 00:00:00 30
ID_3 SPLIT_NO A 2024-04-01 00:00:00 10
ID_3 SPLIT_NO A 2024-03-01 00:00:00 10

Now we need to perform conditional grouping: group by the first two fields ID and SPLIT. If there are no duplicate DATEs within the group, keep the record with the latest date in this group and replace AMOUNT with the sum of AMOUNTs in this group; If there are duplicate dates within the group, group the records in this group again by CUST, while retaining the record with the latest date in the current group, and replace the AMOUNT with the sum of the AMOUNTs in the current group.

ID SPLIT CUST DATE AMOUNT
ID_1 SPLIT_NO A 2024-04-01 00:00:00 200
ID_1 SPLIT_YES A 2024-05-01 00:00:00 150
ID_2 SPLIT_NO A 2024-04-01 00:00:00 600
ID_2 SPLIT_YES A 2024-05-01 00:00:00 50
ID_3 SPLIT_NO A 2024-04-01 00:00:00 20
ID_3 SPLIT_NO B 2024-04-01 00:00:00 30
ID_3 SPLIT_YES B 2024-04-01 00:00:00 90

SQL:

SELECT id, split,cust,date_column,
       CASE num_cust WHEN 1 THEN total_amount ELSE total_cust_amount  END AS amount
FROM   (
  SELECT t.*,
         COUNT(DISTINCT CASE rnk WHEN 1 THEN cust END)
           OVER (PARTITION BY id, split) AS num_cust
  FROM   (
    SELECT t.*,
           DENSE_RANK() OVER (PARTITION BY id, split ORDER BY date_column DESC) AS rnk,
           SUM(amount) OVER (PARTITION BY id, split) AS total_amount,
           SUM(amount) OVER (PARTITION BY id, split, cust) AS total_cust_amount
    FROM   test_table_mm t
  ) t
  WHERE  rnk = 1
)

SQL cannot retain grouped subsets and requires the use of multiple nested queries and multiple window functions for indirect implementation, resulting in complex code. After SPL grouping, there is no need to aggregate immediately. The subsets can be kept for further calculation, and subsets can also continue to be grouped:

https://try.esproc.com/splx?44C

A
1 $select * from test_table_mm.txt order by DATE desc
2 =A1.group(ID,SPLIT)
3 =A2.conj(if(~.icount(DATE)==~.count(), [~], ~.group(CUST)))
4 =A3.(~(1).run(AMOUNT=A3.~.sum(AMOUNT)))

A1: Load data and sort it in reverse order by date.

A2: Group by the first two fields, but do not aggregate.

A3: Process each group of data: If the date is not duplicated, return the current group; otherwise, group the current group by CUST and return each group.

A4: Continue processing data of each group: Take the first record of the current group and replace AMOUNT with the sum of AMOUNTs of the current group.

Question source:https://stackoverflow.com/questions/78048701/sql-oracle-conditional-group-by

Clone this wiki locally