-
Notifications
You must be signed in to change notification settings - Fork 3.4k
Add support for concurrent write on Iceberg transformed column #24160
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for concurrent write on Iceberg transformed column #24160
Conversation
0f00cdd to
ff872d7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we intersect enforced with the unenforced predicate?
They are rather unrelated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding is that both carriers information about predicate.
The difference is that taken from @raunaqmorarka explanation:
enforced is the part of the predicate which is guaranteed to be satisfied by the connector, so the engine will not apply it on it's side.
unenforced is the part of the predicate which connector cannot guarantee even if it is able to use it to reduce output, so the engine will apply it on the connector output
@raunaqmorarka Correct me if I'm wrong here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
{2:part:date=[ SortedRangeSet[type=date, ranges=1, {[2024-01-01]}] ]}
I'm checking io.trino.plugin.iceberg.TestIcebergLocalConcurrentWritesTest#testConcurrentUpdateWithOverlappingPartitionTransformation (BTW really cool that we have this battery of new concurrency tests)
It's unclear to me why the partition predicate is not an "enforced predicate" while debugging your code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding such predicates for transformed columns as enforced, would mean pushdowns of those values and connector would need to filter files and rows also during reading from such table as well. It's not supported right now and would require more work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They are rather unrelated.
We already join those in IcebergSplitSource
trino/plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergSplitSource.java
Line 259 in 61a79dd
| TupleDomain<IcebergColumnHandle> effectivePredicate = TupleDomain.intersect( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are able to prune files in IcebergSplitSource based on unenforced filter, but not rows (hence "unenforced"). Filters on identity partitioned columns are enforced at the row level due to the data being partitioned by value and PartitionConstraintMatcher in IcebergSplitSource.
I think it's valid to intersect unenforced filter here because org.apache.iceberg.MergingSnapshotProducer is internally checking whether any files matching given filter have been added to the table.
8da17cc to
c9670a6
Compare
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergMetadata.java
Outdated
Show resolved
Hide resolved
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergMetadata.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are able to prune files in IcebergSplitSource based on unenforced filter, but not rows (hence "unenforced"). Filters on identity partitioned columns are enforced at the row level due to the data being partitioned by value and PartitionConstraintMatcher in IcebergSplitSource.
I think it's valid to intersect unenforced filter here because org.apache.iceberg.MergingSnapshotProducer is internally checking whether any files matching given filter have been added to the table.
c9670a6 to
7623816
Compare
...rino-iceberg/src/test/java/io/trino/plugin/iceberg/TestIcebergLocalConcurrentWritesTest.java
Outdated
Show resolved
Hide resolved
...rino-iceberg/src/test/java/io/trino/plugin/iceberg/TestIcebergLocalConcurrentWritesTest.java
Outdated
Show resolved
Hide resolved
...rino-iceberg/src/test/java/io/trino/plugin/iceberg/TestIcebergLocalConcurrentWritesTest.java
Outdated
Show resolved
Hide resolved
...rino-iceberg/src/test/java/io/trino/plugin/iceberg/TestIcebergLocalConcurrentWritesTest.java
Outdated
Show resolved
Hide resolved
7623816 to
5df5282
Compare
5df5282 to
bb147d6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
But I would like a second opinion on it
@alexjo2144 @findepi @losipiuk @hashhar @electrum are any of you able to weigh in here ?
...in/trino-iceberg/src/test/java/io/trino/plugin/iceberg/TestIcebergLocalConcurrentWrites.java
Outdated
Show resolved
Hide resolved
...in/trino-iceberg/src/test/java/io/trino/plugin/iceberg/TestIcebergLocalConcurrentWrites.java
Outdated
Show resolved
Hide resolved
bb147d6 to
8daa87b
Compare
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergMetadata.java
Outdated
Show resolved
Hide resolved
8daa87b to
36c8476
Compare
| TupleDomain<IcebergColumnHandle> convertibleUnenforcedPredicate = table.getUnenforcedPredicate().filter((_, domain) -> isConvertableToIcebergExpression(domain)); | ||
| TupleDomain<IcebergColumnHandle> effectivePredicate = dataColumnPredicate.intersect(convertibleUnenforcedPredicate); | ||
| if (!effectivePredicate.isAll()) { | ||
| rowDelta.conflictDetectionFilter(toIcebergExpression(effectivePredicate)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pretty cool @pajaks 🎉
Outstanding test coverage.
|
@ebyhr @raunaqmorarka Can this be merged? |
Description
Add concurrent write to scenarios like:
Release notes