-
Notifications
You must be signed in to change notification settings - Fork 5.5k
feat: Support sorted_by for data_rewrite_files procedure
#26804
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft
hantangwangd
wants to merge
2
commits into
prestodb:master
Choose a base branch
from
hantangwangd:support_sort_order_for_rewrite
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Draft
feat: Support sorted_by for data_rewrite_files procedure
#26804
hantangwangd
wants to merge
2
commits into
prestodb:master
from
hantangwangd:support_sort_order_for_rewrite
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Contributor
Reviewer's GuideAdds support for an optional sorted_by argument to the Iceberg data_rewrite_files distributed procedure, plumbs the chosen sort order through the distributed procedure handle so rewritten files can be produced in a validated sort order compatible with the table’s internal sort order, and extends tests to cover sort-order behavior and the new argument wiring. Sequence diagram for begin flow of data_rewrite_files with sorted_bysequenceDiagram
participant U as User
participant S as SQLParserPlanner
participant P as TableDataRewriteDistributedProcedure
participant R as RewriteDataFilesProcedure
participant Ctx as IcebergProcedureContext
participant L as IcebergTableLayoutHandle
participant T as Table
participant H as IcebergDistributedProcedureHandle
U->>S: CALL iceberg.system.data_rewrite_files(schema, table, filter, sorted_by, options)
S->>P: begin(session, procedureContext, tableLayoutHandle, arguments)
P->>P: locate schemaIndex, tableNameIndex, filterIndex, sortOrderIndex
P->>R: begin(session, procedureContext, tableLayoutHandle, arguments, sortOrderIndex)
R->>Ctx: getTable()
Ctx-->>R: Table
R->>L: getTable()
L-->>R: IcebergTableHandle
R->>T: sortOrder()
T-->>R: SortOrder tableSortOrder
alt sorted_by argument present
R->>R: read arguments[sortOrderIndex]
R->>R: parseSortFields(schema, sortFieldStrings)
R->>R: specifiedSortOrder.satisfies(tableSortOrder)
alt compatible sort order
R->>R: sortOrder = specifiedSortOrder
else incompatible
R->>R: throw PrestoException(NOT_SUPPORTED)
end
else no sorted_by argument
R->>R: sortOrder = tableSortOrder or empty
end
R->>R: getSupportedSortFields(schema, sortOrder)
R->>H: new IcebergDistributedProcedureHandle(..., sortFields, tableLayoutHandle, relevantData)
R-->>P: ConnectorDistributedProcedureHandle
P-->>S: ConnectorDistributedProcedureHandle
S-->>U: Distributed procedure handle for rewrite task execution
Class diagram for updated data_rewrite_files distributed procedure and handleclassDiagram
class TableDataRewriteDistributedProcedure {
<<class>>
+static String SCHEMA
+static String TABLE_NAME
+static String FILTER
+static String SORT_ORDER
-BeginCallDistributedProcedure beginCallDistributedProcedure
-FinishCallDistributedProcedure finishCallDistributedProcedure
-int schemaIndex
-int tableNameIndex
-OptionalInt filterIndex
-OptionalInt sortOrderIndex
+TableDataRewriteDistributedProcedure(String schema, String name, List~Argument~ arguments, BeginCallDistributedProcedure beginCallDistributedProcedure, FinishCallDistributedProcedure finishCallDistributedProcedure)
+ConnectorDistributedProcedureHandle begin(ConnectorSession session, ConnectorProcedureContext procedureContext, ConnectorTableLayoutHandle tableLayoutHandle, Object[] arguments)
+String getSchema(Object[] parameters)
+String getTableName(Object[] parameters)
+String getFilter(Object[] parameters)
+OptionalInt getSortOrderIndex()
}
class BeginCallDistributedProcedure {
<<interface>>
+ConnectorDistributedProcedureHandle begin(ConnectorSession session, ConnectorProcedureContext procedureContext, ConnectorTableLayoutHandle tableLayoutHandle, Object[] arguments, OptionalInt sortOrderIndex)
}
class FinishCallDistributedProcedure {
<<interface>>
+void finish(ConnectorSession session, ConnectorProcedureContext procedureContext, ConnectorTableHandle tableHandle, Collection~ShardInfo~ fragments)
}
class RewriteDataFilesProcedure {
<<class>>
+DistributedProcedure get()
-ConnectorDistributedProcedureHandle beginCallDistributedProcedure(ConnectorSession session, IcebergProcedureContext procedureContext, IcebergTableLayoutHandle layoutHandle, Object[] arguments, OptionalInt sortOrderIndex)
}
class IcebergDistributedProcedureHandle {
<<class>>
-IcebergTableLayoutHandle tableLayoutHandle
-Map~String, String~ relevantData
+IcebergDistributedProcedureHandle(String schemaName, String tableName, String tableLocation, List~String~ dataColumns, List~String~ partitionColumns, String fileFormat, HiveCompressionCodec compressionCodec, Map~String, String~ storageProperties, List~SortField~ sortOrder, IcebergTableLayoutHandle tableLayoutHandle, Map~String, String~ relevantData)
}
class SortOrder {
<<class>>
+boolean satisfies(SortOrder other)
}
class SortField {
<<class>>
}
class IcebergProcedureContext {
<<class>>
+Table getTable()
}
class IcebergTableLayoutHandle {
<<class>>
+IcebergTableHandle getTable()
}
class IcebergTableHandle {
<<class>>
+String getSchemaName()
+String getIcebergTableName()
+String getTableLocation()
}
class Table {
<<class>>
+SortOrder sortOrder()
+Schema schema()
+Map~String, String~ properties()
}
TableDataRewriteDistributedProcedure ..> BeginCallDistributedProcedure : uses
TableDataRewriteDistributedProcedure ..> FinishCallDistributedProcedure : uses
BeginCallDistributedProcedure <|.. RewriteDataFilesProcedure : implements
RewriteDataFilesProcedure ..> IcebergProcedureContext : uses
RewriteDataFilesProcedure ..> IcebergTableLayoutHandle : uses
RewriteDataFilesProcedure ..> IcebergDistributedProcedureHandle : creates
RewriteDataFilesProcedure ..> SortOrder : uses
RewriteDataFilesProcedure ..> SortField : uses
RewriteDataFilesProcedure ..> Table : uses
IcebergProcedureContext ..> Table : returns
IcebergTableLayoutHandle ..> IcebergTableHandle : returns
Table ..> SortOrder : returns
IcebergDistributedProcedureHandle ..> IcebergTableLayoutHandle : has
IcebergDistributedProcedureHandle ..> SortField : has
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
cf6071b to
abbc90a
Compare
abbc90a to
4ff8a33
Compare
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
Motivation and Context
Impact
Test Plan
Contributor checklist
Release Notes