-
Notifications
You must be signed in to change notification settings - Fork 3.2k
Fix reading specifying format for date partition projection in Hive connector #25817
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Replace legacy formatting with `DateTimeFormatter` for better compatibility with the `java.time` package, providing more precise and predictable behavior This refactor follows AWS Athena's date-type partition projection format (https://docs.aws.amazon.com/athena/latest/ug/partition-projection-supported-types.html) in the description. Additionally, this commit replaces the error-prone Supplier<Instant> with a direct Instant type for `leftBound` and `rightBound`.
af8efa8
to
6ef1fa6
Compare
plugin/trino-hive/src/main/java/io/trino/plugin/hive/projection/DateProjection.java
Outdated
Show resolved
Hide resolved
6ef1fa6
to
bc09276
Compare
@@ -1719,12 +1722,13 @@ public void finishStatisticsCollection(ConnectorSession session, ConnectorTableH | |||
.map(HiveColumnHandle::getType) | |||
.collect(toImmutableList()); | |||
|
|||
Optional<PartitionProjection> partitionProjection = getPartitionProjectionFromHiveTable(handle); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Handling partition projection within this path seems incorrect, although I'm not sure I understand the implications of it since the code was already somewhat wrong in this regard.
The issue is that partition projection avoids storing the actual partitions in the metastore and computes the values in memory instead. That means that partition statistics don't exist on a per partition basis and the preferred approach is probably to handle these in the same way that unpartitioned tables are handled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where is the "unpartitioned tables are handled" ?
It seems make no sense to handle in the "finish" stage, I am going to put the the Optional.empty here, as the partition will mismatch at the begining.
About the projection partition stats, I was thought the partition is mapping to the exactly a path
That means that partition statistics don't exist on a per partition basis and the preferred approach is probably to handle these in the same way that unpartitioned tables are handled.
I have no idea, how to make it to act as the unpartitioned table, could you please help to suggest?
Do we need to even update logic when we collecting the column statistics?
But it confuses me: let's say we have a table with partition projection enabled and with the location template: "s3://bucket/some_path1/{pt1}/some_path2/{pt2}-xxx/some_path3", then if we want to collect the statistics, but if want this act as the unpartitioned table, what is the "table path", is it should be something like :"s3//bucket/some_path1/" ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would like to don't change the method canonicalizePartitionValues
and just pass the Optional.empty()
when call the method parsePartitionValue
inside it.
It's somehow already broken when collecting the partition projection table at the begining, we could just leave this part as it is, or: we could reject the collecting stats at the begin part?
@@ -52,6 +53,31 @@ public PartitionProjection(Optional<String> storageLocationTemplate, Map<String, | |||
this.columnProjections = ImmutableMap.copyOf(requireNonNull(columnProjections, "columnProjections is null")); | |||
} | |||
|
|||
// TODO: support writing partition projection | |||
public boolean allowWrite() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Control flow here doesn't handle the possibility of a storageLocationTemplate
being present that doesn't match Hive's path name assumptions (this can happen with any projection type).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated
plugin/trino-hive/src/main/java/io/trino/plugin/hive/projection/PartitionProjection.java
Outdated
Show resolved
Hide resolved
a8cabd2
to
e60ed0f
Compare
e60ed0f
to
6c9b3ea
Compare
Description
Closes #25642
This PR assumes that the underlying partition paths already follow the user-defined format, specifically for date projection.
If the projection
storage.location.template
isn't set, the partition paths follow the default Hive format:partName=${partValue}
. In this case,partValue
is escaped for special characters (e.g., / becomes %2F), same as the existing writing logic in Hive today.If the
storage.location.template
is set, the partition value is inserted into the path as is, without escaping special characters. For example:• With a custom date projection format and a partition column
dt
formatted asyyyy/MM/dd
, this PR allows reading the partition path:dt=2015%2F10%2F10
• If the
storage.location.template
is set to something like/aaa/bbb/${dt}-xxx
, this PR enables reading the partition path:/aaa/bbb/2015/10/10-xxx
This PR not support writing
DateProjection
with user-defined format that not compatible with Hive date/timestamp format.Additional context and related issues
Alternative for #25657
Release notes
( ) This is not user-visible or is docs only, and no release notes are required.
( ) Release notes are required. Please propose a release note for me.
(x) Release notes are required, with the following suggested text: