Skip to content

Conversation

@CTTY
Copy link
Contributor

@CTTY CTTY commented Oct 20, 2025

Which issue does this PR close?

What changes are included in this PR?

  • Added TaskWriter to leverage RecordBatchPartitionSplitter and projected partition values
  • Add UnpartitionedWriter to help write unpartitioned data

Are these changes tested?

Added unit tests

@CTTY CTTY force-pushed the ctty/task-writer branch from e875e8e to f4b72ef Compare October 23, 2025 03:05
Copy link
Contributor

@liurenjie1024 liurenjie1024 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we are on the right track. I left some comments, and we need to split them into smaller prs.

Comment on lines 85 to 87
Fanout(FanoutWriter<B>),
/// Writer for partitioned tables with sorted data (maintains single active writer)
Clustered(ClusteredWriter<B>),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could simplify this as

Partitioned {
   splitter: RecordBatchSplitter,
   partitioned_writer: Arc<dyn PartitionedWriter>
}

liurenjie1024 pushed a commit that referenced this pull request Oct 28, 2025
…atchPartitionSplitter (#1781)

## Which issue does this PR close?

- Closes #1786
- Covered some of changes from the previous draft: #1769 

## What changes are included in this PR?
- Move PartitionValueCalculator to core/arrow so it can be reused by
RecordBatchPartitionSplitter
- Allow skipping partition value calculation in partition splitter for
projected batches
- Return <PartitionKey, RecordBatch> rather than <Struct, RecordBatch>
pairs in RecordBatchPartitionSplitter::split


## Are these changes tested?
Added uts
@CTTY CTTY closed this Oct 29, 2025
@CTTY CTTY force-pushed the ctty/task-writer branch from c5061e2 to d3d3127 Compare October 29, 2025 09:30
@CTTY CTTY reopened this Oct 29, 2025
@CTTY CTTY changed the title feat(io): UnpartitionedWriter + TaskWriter feat(datafusion): Add TaskWriter for DataFusion Oct 29, 2025
@CTTY CTTY marked this pull request as ready for review October 29, 2025 09:37
@CTTY CTTY requested a review from liurenjie1024 October 29, 2025 14:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add TaskWriter

2 participants