You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Suppose a query includes a skipping-eligible predicate over LONG column c.
Then we expect the add.stats column to include min/max stats that can parse as LONG.
However, it is possible that a recent table replace operation changed the schema of c -- previously it was a STRING column. This is incompatible with c new type. In that case, every file that had been in the table will be canceled by a remove (table replacement always truncates the original table), ensuring that no incompatible file actions survive log replay.
Unfortunately, kernel currently attempts to parse the entire add.stats column before deduplicating (in order to avoid tracking pruned files), and is thus exposed to parsing failures for rows that contain canceled add actions (e.g. add.stats.minValues.c = 'A' cannot parse as LONG).
This issue has a second aspect: Data skipping doesn't track or exclude partition columns directly. So we attempt data skipping over c, with the same risk of parsing failures, even if it's now a partition column. Fixing the general problem would make this harmless, but it's probably worth specifically tracking and excluding partition columns from the data skipping machinery so we don't waste time trying to parse (usually non-existent) stats and evaluating (provably useless) data skipping expressions for partition columns.
NOTE: Ideally, this issue should not arise if column mapping is enabled, because the physical names of the new columns should differ from the originals even if their logical names still seem to match.
To Reproduce
Invoke LogReplayScanner::process_scan_batch twice -- once with a batch containing an incompatible remove (to mark the file as "seen"), and again with a batch containing a matching incompatible add. It will fail with e.g.
Arrow(JsonError("whilst decoding field 'minValues': whilst decoding field 'c': failed to parse \"A\" as Int64"))
Expected behavior
The previously-seen remove should eliminate the add before it gets a chance to cause trouble.
Additional context
No response
The text was updated successfully, but these errors were encountered:
NOTE: The java kernel avoids this issue by deduplicating file actions before attempting to parse add.stats, and also because the json parser honors selection vectors and ignores unselected rows.
The rust kernel json parser also ignores null rows, but we don't currently (have a way to) update the null mask based on the deduplication kernel performed. We'll need to figure out how to do that. Additionally, we would want to split the deduplication into "check" and "update" passes, so that we can do:
Sanitize the rows of a batch (eliminate non-file action rows, eliminate previously seen files, etc
Parse stats and partition values of surviving rows, apply further pruning
Update the "seen" set only for files that survived pruning
That way, we get the best of both worlds: pruning minimizes the cardinality of the "seen" set, but the "seen" set can still protect pruning attempts from incompatible schema changes.
Describe the bug
Suppose a query includes a skipping-eligible predicate over LONG column
c
.Then we expect the
add.stats
column to include min/max stats that can parse as LONG.However, it is possible that a recent table replace operation changed the schema of
c
-- previously it was a STRING column. This is incompatible withc
new type. In that case, every file that had been in the table will be canceled by aremove
(table replacement always truncates the original table), ensuring that no incompatible file actions survive log replay.Unfortunately, kernel currently attempts to parse the entire
add.stats
column before deduplicating (in order to avoid tracking pruned files), and is thus exposed to parsing failures for rows that contain canceled add actions (e.g.add.stats.minValues.c = 'A'
cannot parse as LONG).This issue has a second aspect: Data skipping doesn't track or exclude partition columns directly. So we attempt data skipping over
c
, with the same risk of parsing failures, even if it's now a partition column. Fixing the general problem would make this harmless, but it's probably worth specifically tracking and excluding partition columns from the data skipping machinery so we don't waste time trying to parse (usually non-existent) stats and evaluating (provably useless) data skipping expressions for partition columns.NOTE: Ideally, this issue should not arise if column mapping is enabled, because the physical names of the new columns should differ from the originals even if their logical names still seem to match.
To Reproduce
Invoke
LogReplayScanner::process_scan_batch
twice -- once with a batch containing an incompatibleremove
(to mark the file as "seen"), and again with a batch containing a matching incompatibleadd
. It will fail with e.g.Expected behavior
The previously-seen
remove
should eliminate theadd
before it gets a chance to cause trouble.Additional context
No response
The text was updated successfully, but these errors were encountered: