Skip to content

Conversation

PiotrSrebrny
Copy link
Contributor

Which issue does this PR close?

Are these changes tested?

Changes where not tested and should be covered by existing.

Are there any user-facing changes?

The patch removes BufWriter from TrackedWrite, thus ArrowWrite no longer wraps the provided Write with it. If the underlying writer performs poorly with small, repeated writes (e.g., a TCP socket), a user should add the BufWriter wrapping for its writer for better performance.

@github-actions github-actions bot added the parquet Changes to the parquet crate label Oct 3, 2025
@tustvold
Copy link
Contributor

tustvold commented Oct 6, 2025

I think I would prefer to:

  • Fix flush
  • Add a separate option to create an unbuffered writer

See #8534 (comment)

@PiotrSrebrny
Copy link
Contributor Author

I am fine with this, I can add new_unbuffered() function to TrackedWrite

@PiotrSrebrny
Copy link
Contributor Author

I will have to add unbuffered option to the WriterProperties that gets propagated from ArrowWriter to the SerializedFileWriter to make it work.

@PiotrSrebrny PiotrSrebrny force-pushed the remove-BufWriter-from-TrackedWrite branch 2 times, most recently from 8a8361d to da97dd1 Compare October 6, 2025 10:17
@PiotrSrebrny PiotrSrebrny changed the title [Parquet] Remove BufWriter from TrackedWrite [Parquet] Add Unbuffered writer to TrackedWrite Oct 6, 2025
@PiotrSrebrny PiotrSrebrny force-pushed the remove-BufWriter-from-TrackedWrite branch from da97dd1 to 6496cb3 Compare October 6, 2025 11:11
@etseidl
Copy link
Contributor

etseidl commented Oct 10, 2025

Marking this as draft since it's superseded by #8586.

@etseidl etseidl marked this pull request as draft October 10, 2025 23:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

parquet Changes to the parquet crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Parquet] ArrowWriter flush does not work

3 participants