Skip to content

Conversation

@mblanco-denodo
Copy link
Contributor

Description

Upgrade delta-kernel-api and delta-kernel-defaults library to version 3.3.2

Motivation and Context

Upgrading those libraries will make us able to support future improvements in the connector, like support for deletion vectors, type widening, variant type... .

Impact

Library bug fixes from 3.2.0 and ability to support new features. In the 3.2.1 version, this bug was fixed:
delta-io/delta#3291
so this previous bugfix has been undone as it is not needed anymore: #26397

Test Plan

There already exist unit tests. Since this is only a library version upgrade, passing the unit tests should be our target.

Release Notes

== NO RELEASE NOTE ==

@mblanco-denodo mblanco-denodo requested a review from a team as a code owner December 16, 2025 18:49
@sourcery-ai
Copy link
Contributor

sourcery-ai bot commented Dec 16, 2025

Reviewer's guide (collapsed on small PRs)

Reviewer's Guide

Upgrades the presto-delta connector to use delta-kernel-api 3.3.2 and aligns file path handling with the newer kernel behavior by no longer wrapping file paths with URI.create().

Sequence diagram for updated file path handling in DeltaSplitManager.getNextBatch

sequenceDiagram
    participant PrestoScheduler
    participant DeltaSplitManager
    participant DeltaKernelReader
    participant AddFileStatus
    participant HiveSplitCreator

    PrestoScheduler->>DeltaSplitManager: getNextBatch(partitionHandle, tableHandle, splitSchedulingContext)
    DeltaSplitManager->>DeltaKernelReader: listFilesForScan(tableHandle)
    DeltaKernelReader-->>DeltaSplitManager: AddFileStatus rows

    loop For each AddFileStatus
        DeltaSplitManager->>AddFileStatus: getPath()
        AddFileStatus-->>DeltaSplitManager: filePath
        Note right of DeltaSplitManager: filePath is now used directly
        DeltaSplitManager->>HiveSplitCreator: createSplit(filePath, startOffset, length)
        HiveSplitCreator-->>DeltaSplitManager: ConnectorSplit
    end

    DeltaSplitManager-->>PrestoScheduler: ConnectorSplitBatch
Loading

Sequence diagram for updated partition predicate evaluation file path handling

sequenceDiagram
    participant Planner
    participant DeltaExpressionUtils
    participant InternalScanFileUtils
    participant Row
    participant AddFileStatus

    Planner->>DeltaExpressionUtils: evaluatePartitionPredicate(predicate, partitionColumns, row)

    loop For each partitionColumn
        DeltaExpressionUtils->>InternalScanFileUtils: getPartitionValues(row)
        InternalScanFileUtils-->>DeltaExpressionUtils: partitionValues
        DeltaExpressionUtils->>partitionValues: get(columnName)
        partitionValues-->>DeltaExpressionUtils: partitionValue

        DeltaExpressionUtils->>InternalScanFileUtils: getAddFileStatus(row)
        InternalScanFileUtils-->>DeltaExpressionUtils: AddFileStatus
        DeltaExpressionUtils->>AddFileStatus: getPath()
        AddFileStatus-->>DeltaExpressionUtils: filePath
        Note right of DeltaExpressionUtils: filePath is used directly without URI.create

        DeltaExpressionUtils->>DeltaExpressionUtils: getDomain(partitionColumn, partitionValue, typeManager, filePath)
    end

    DeltaExpressionUtils-->>Planner: boolean result
Loading

Class diagram for updated path handling in presto-delta

classDiagram
    class DeltaSplitManager {
        +CompletableFuture getNextBatch(ConnectorPartitionHandle partitionHandle, ConnectorTableHandle tableHandle, SplitSchedulingContext splitSchedulingContext)
    }

    class DeltaExpressionUtils {
        -static boolean evaluatePartitionPredicate(ConnectorSession session, TupleDomain partitionPredicate, List partitionColumns, Object row, TypeManager typeManager)
        -static Domain getDomain(DeltaColumnHandle partitionColumn, String partitionValue, TypeManager typeManager, String filePath)
    }

    class InternalScanFileUtils {
        +Map getPartitionValues(Object row)
        +AddFileStatus getAddFileStatus(Object row)
    }

    class AddFileStatus {
        +String getPath()
        +long getSize()
    }

    class ConnectorSplitBatch
    class ConnectorPartitionHandle
    class ConnectorTableHandle
    class SplitSchedulingContext
    class Domain
    class DeltaColumnHandle {
        +String getName()
    }
    class TypeManager

    DeltaSplitManager --> AddFileStatus : uses
    DeltaSplitManager --> ConnectorSplitBatch : returns
    DeltaSplitManager --> ConnectorPartitionHandle : parameter
    DeltaSplitManager --> ConnectorTableHandle : parameter
    DeltaSplitManager --> SplitSchedulingContext : parameter

    DeltaExpressionUtils --> InternalScanFileUtils : uses
    DeltaExpressionUtils --> AddFileStatus : uses
    DeltaExpressionUtils --> Domain : creates
    DeltaExpressionUtils --> DeltaColumnHandle : uses
    DeltaExpressionUtils --> TypeManager : uses
    InternalScanFileUtils --> AddFileStatus : returns
Loading

File-Level Changes

Change Details Files
Upgrade delta-kernel-api dependency to version 3.3.2 for the presto-delta module.
  • Updated the io.delta.delta-kernel-api.version property from 3.2.0 to 3.3.2 in the Maven POM.
  • Kept other build properties and configuration unchanged.
presto-delta/pom.xml
Align Delta connector file path handling with the new delta-kernel behavior.
  • Stopped converting add file paths to URI and using getPath(); now use the path returned directly from getAddFileStatus(row).getPath().
  • Adjusted debug logging and domain computation to use the direct file path string.
  • Updated split creation to pass the direct file path to the split constructor.
presto-delta/src/main/java/com/facebook/presto/delta/DeltaExpressionUtils.java
presto-delta/src/main/java/com/facebook/presto/delta/DeltaSplitManager.java

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

## Description
Upgrade delta-kernel-api and delta-kernel-defaults library to version
3.3.2

## Motivation and Context
Upgrading those libraries will make us able to support future
improvements in the connector, like support for deletion vectors, type
widening, varian type... .

## Impact
Library bug fixes from 3.2.0 and ability to support new features.
In the 3.2.1 version, this bug was fixed:
delta-io/delta#3291
so this previous bugfix has been undone as it is not needed anymore:
prestodb#26397

## Test Plan
There already exist unit tests. Since this is only a library version
upgrade, passing the unit tests should be our target.

## Release Notes
```
== NO RELEASE NOTE ==
```
Copy link
Member

@agrawalreetika agrawalreetika left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR @mblanco-denodo

It would be good to merge this, as this is reverting the changes done in https://github.com/prestodb/presto/pull/26814/changes, which got some issues with S3 FS related paths mentioned here

Copy link
Member

@hantangwangd hantangwangd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants