-
Notifications
You must be signed in to change notification settings - Fork 5.5k
feat(plugin-delta): Upgrade io.delta.delta-kernel libraries to 3.3.2 #26814
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Reviewer's guide (collapsed on small PRs)Reviewer's GuideUpgrades the presto-delta connector to use delta-kernel-api 3.3.2 and aligns file path handling with the newer kernel behavior by no longer wrapping file paths with URI.create(). Sequence diagram for updated file path handling in DeltaSplitManager.getNextBatchsequenceDiagram
participant PrestoScheduler
participant DeltaSplitManager
participant DeltaKernelReader
participant AddFileStatus
participant HiveSplitCreator
PrestoScheduler->>DeltaSplitManager: getNextBatch(partitionHandle, tableHandle, splitSchedulingContext)
DeltaSplitManager->>DeltaKernelReader: listFilesForScan(tableHandle)
DeltaKernelReader-->>DeltaSplitManager: AddFileStatus rows
loop For each AddFileStatus
DeltaSplitManager->>AddFileStatus: getPath()
AddFileStatus-->>DeltaSplitManager: filePath
Note right of DeltaSplitManager: filePath is now used directly
DeltaSplitManager->>HiveSplitCreator: createSplit(filePath, startOffset, length)
HiveSplitCreator-->>DeltaSplitManager: ConnectorSplit
end
DeltaSplitManager-->>PrestoScheduler: ConnectorSplitBatch
Sequence diagram for updated partition predicate evaluation file path handlingsequenceDiagram
participant Planner
participant DeltaExpressionUtils
participant InternalScanFileUtils
participant Row
participant AddFileStatus
Planner->>DeltaExpressionUtils: evaluatePartitionPredicate(predicate, partitionColumns, row)
loop For each partitionColumn
DeltaExpressionUtils->>InternalScanFileUtils: getPartitionValues(row)
InternalScanFileUtils-->>DeltaExpressionUtils: partitionValues
DeltaExpressionUtils->>partitionValues: get(columnName)
partitionValues-->>DeltaExpressionUtils: partitionValue
DeltaExpressionUtils->>InternalScanFileUtils: getAddFileStatus(row)
InternalScanFileUtils-->>DeltaExpressionUtils: AddFileStatus
DeltaExpressionUtils->>AddFileStatus: getPath()
AddFileStatus-->>DeltaExpressionUtils: filePath
Note right of DeltaExpressionUtils: filePath is used directly without URI.create
DeltaExpressionUtils->>DeltaExpressionUtils: getDomain(partitionColumn, partitionValue, typeManager, filePath)
end
DeltaExpressionUtils-->>Planner: boolean result
Class diagram for updated path handling in presto-deltaclassDiagram
class DeltaSplitManager {
+CompletableFuture getNextBatch(ConnectorPartitionHandle partitionHandle, ConnectorTableHandle tableHandle, SplitSchedulingContext splitSchedulingContext)
}
class DeltaExpressionUtils {
-static boolean evaluatePartitionPredicate(ConnectorSession session, TupleDomain partitionPredicate, List partitionColumns, Object row, TypeManager typeManager)
-static Domain getDomain(DeltaColumnHandle partitionColumn, String partitionValue, TypeManager typeManager, String filePath)
}
class InternalScanFileUtils {
+Map getPartitionValues(Object row)
+AddFileStatus getAddFileStatus(Object row)
}
class AddFileStatus {
+String getPath()
+long getSize()
}
class ConnectorSplitBatch
class ConnectorPartitionHandle
class ConnectorTableHandle
class SplitSchedulingContext
class Domain
class DeltaColumnHandle {
+String getName()
}
class TypeManager
DeltaSplitManager --> AddFileStatus : uses
DeltaSplitManager --> ConnectorSplitBatch : returns
DeltaSplitManager --> ConnectorPartitionHandle : parameter
DeltaSplitManager --> ConnectorTableHandle : parameter
DeltaSplitManager --> SplitSchedulingContext : parameter
DeltaExpressionUtils --> InternalScanFileUtils : uses
DeltaExpressionUtils --> AddFileStatus : uses
DeltaExpressionUtils --> Domain : creates
DeltaExpressionUtils --> DeltaColumnHandle : uses
DeltaExpressionUtils --> TypeManager : uses
InternalScanFileUtils --> AddFileStatus : returns
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
b46d422 to
27f8e2f
Compare
## Description Upgrade delta-kernel-api and delta-kernel-defaults library to version 3.3.2 ## Motivation and Context Upgrading those libraries will make us able to support future improvements in the connector, like support for deletion vectors, type widening, varian type... . ## Impact Library bug fixes from 3.2.0 and ability to support new features. In the 3.2.1 version, this bug was fixed: delta-io/delta#3291 so this previous bugfix has been undone as it is not needed anymore: prestodb#26397 ## Test Plan There already exist unit tests. Since this is only a library version upgrade, passing the unit tests should be our target. ## Release Notes ``` == NO RELEASE NOTE == ```
27f8e2f to
59a0449
Compare
agrawalreetika
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR @mblanco-denodo
It would be good to merge this, as this is reverting the changes done in https://github.com/prestodb/presto/pull/26814/changes, which got some issues with S3 FS related paths mentioned here
hantangwangd
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @mblanco-denodo
Description
Upgrade delta-kernel-api and delta-kernel-defaults library to version 3.3.2
Motivation and Context
Upgrading those libraries will make us able to support future improvements in the connector, like support for deletion vectors, type widening, variant type... .
Impact
Library bug fixes from 3.2.0 and ability to support new features. In the 3.2.1 version, this bug was fixed:
delta-io/delta#3291
so this previous bugfix has been undone as it is not needed anymore: #26397
Test Plan
There already exist unit tests. Since this is only a library version upgrade, passing the unit tests should be our target.
Release Notes