-
Notifications
You must be signed in to change notification settings - Fork 35
Open
Description
There are three concepts related to Iceberg data files:
A: files defining rows to add
B: files defining rows to remove
C: files defining either rows to add or rows to remove (encompasses both A and B)
The term "delete file" unambiguously refers to B. On the other hand, "data file" can refer to A or to C. When one sees a variable or field called data_files, it is hard to know whether that refers to A or C. For example at
| data_files, |
| let data_files_iter = delete_files.iter().chain(data_files.iter()); |
I wonder if we can come up with distinct canonical terms for A, B, C. E.g.
A: "data file"
B: "delete file"
C: "content file"
Or alternatively:
A: "insert file"
B: "delete file"
C: "data file"
Or any other 3 distinct terms for A, B, C.
Unfortunately the Iceberg spec suffers from the same issue so we cannot use it for inspiration.
gruuya
Metadata
Metadata
Assignees
Labels
No labels