Skip to content

How to perform difference set operation on continuous intervals with esProc

esProcSPL edited this page Jun 27, 2025 · 1 revision

There are two tables in a certain database. The original inventory table data_add stores multiple batches of inventory for multiple items. Each batch of inventory has a starting number START_NUM and an ending number END_NUM, representing the range of an interval.

ID ITEM_ID START_NUM END_NUM
1 337 101 400
2 337 500 800
3 337 801 1200
4 337 1500 1600
5 337 15000 16000
6 337 20000 30000
7 444 20 30

The consumption table data_cons stores multiple batches of consumption for multiple items, with each batch of consumption being an interval.

ID ITEM_ID START_NUM END_NUM
1 337 240 300
2 337 301 400
3 337 850 1100
4 337 1500 1510
5 337 15000 16000

Now we need to calculate the current inventory of each item, which is the difference set between the multi segment range of the original inventory and the multi segment range of consumption, and represent the result as a multi segment range. The interval of the original inventory may be consumed into discontinuous multiple intervals, in which case multiple records need to be naturally generated, with each record corresponding to an interval. For example, the interval of the original inventory [500:1200] is consumed into two intervals [500:849] and [1101:1200].

ITEM_ID START_NUM END_NUM
337 101 239
337 500 849
337 1101 1200
337 1511 1600
337 20000 30000
444 20 30

SQL cannot represent sets with variables, making it inconvenient to perform operations between sets, and the code is very cumbersome. SPL can represent sets with variables, making it easy to express various set operations:

https://try.esproc.com/splx?3xM

 A B
1 $select * from data_add.txt $select * from data_cons.txt
2 =A1.group(ITEM_ID;~.conj(to(START_NUM,END_NUM)):a) =B1.group(ITEM_ID;~.conj(to(START_NUM,END_NUM)):b)
3 =A2.join(ITEM_ID,B2,b)
4 =A3.derive([a, b].merge@d().group@i(!=[-1]+1):diff)
5 =A4.news(diff; ITEM_ID, ~1:START_NUM, ~.m(-1):END_NUM)

A1-B1: Load data.

A2: Use the group function to group the original inventory by item, but do not aggregate. Convert each interval within the group into a small set of continuous sequence, and then union them into a large set. ~ represents the current group, the function 'to' can generate a continuous sequence based on the start and end numbers.

Picture1png B2: Apply the same processing to the consumption table.

Picture2png A3=A2.join(ITEM_ID,B2,b):Use the join function to perform left join by item ID.

Picture3png A4=A3.derive([a, b].merge@d()…..) Add a calculated column and first calculate the difference between the original inventory set and the consumption set of each item. The merge function merges ordered sets, @d represents calculating the difference set. Note that the sequence after the difference set is discontinuous, such as 849 and 1101.

Picture4png group@i(!=[-1]+1) Then perform conditional grouping to each difference set, grouping the consecutive sequence into the same group, such as 849 and 1101 being assigned to the second and third groups, respectively. The function group is used for grouping, by default is equivalence grouping, @i represents grouping by condition, and ~[-1] represents the previous member.

Picture5png A5=A4.news(diff; ITEM_ID, ~1:START_NUM, ~.m(-1):END_NUM) Generate a new record using each sequence in the diff field of each record in A4, where the start and end numbers of the new interval are taken from the beginning and end of each sequence. The function news can expand each member of a set into one record. ~.m(-1) represents the last member in ~, ~.m(1) represents the first member in ~, abbreviated as ~1.

Picture6png

Clone this wiki locally