-
Notifications
You must be signed in to change notification settings - Fork 3.8k
GH-46971: [C++][Parquet] Use temporary buffers when decrypting Parquet data pages #46972
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
a1a6dd7
to
2fb6895
Compare
Do we want to do the same for the decompression buffer? It would also be easier to benchmark, as compressed Parquet files are much more common than encrypted Parquet files. |
I was assuming the decompression buffers remain referenced by the returned record batches so doing the same for those wouldn't help, but I haven't verified that that's true. I'll test that too. |
Only in the zero-copy cases, which are quite limited (fixed-width type, PLAIN encoding, no nulls, no encryption). And even then, we probably don't always do zero-copy. |
2fb6895
to
7d639fd
Compare
Right OK that makes sense. I'm testing with plain encoded float columns so hadn't noticed that but yes it might be a good idea to also change the decompression buffers then. I did some rough benchmarks with
The behaviour with mimalloc and jemalloc looks good, but the results with the system allocator are quite concerning. The max RSS decreases if using temporary decryption buffers, but actually increases quite significantly when also using temporary decompression buffers. I'm not sure why that would be, maybe this causes more memory fragmentation? (C++ heap memory management is not something I know a lot about...). There is also a noticeable slow-down in the temporary decryption buffer case with the system allocator. Maybe this is acceptable, given most users will be using mimalloc? I also tested with unencrypted data:
Based on these benchmarks alone, maybe only the decryption buffers should be temporary. But I've only tested with plain float data. I'll look into testing with more data types and encodings. |
@ursabot please benchmark |
Benchmark runs are scheduled for commit 7d639fd. Watch https://buildkite.com/apache-arrow and https://conbench.ursa.dev for updates. A comment will be posted here when the runs are complete. |
By the way, how did you get or generate your test files? A 1 MiB page size sounds rather large. |
I'm using ParquetSharp but this is a wrapper of the C++ Parquet library, and 1 MiB is the default there: arrow/cpp/src/parquet/properties.h Line 157 in 0b34e6b
|
Hmm, I was under the impression that another factor limited the actual page size produced by Parquet C++, but I can't find again what it is. @wgtmac Could you enlighten me here? |
@pitrou Did you mean CDC? arrow/cpp/src/parquet/column_writer.cc Line 1418 in 3ebe7ee
|
@wgtmac No, I mean other parameters. |
1024 * 1024 is also the default max row group size. Maybe for some integer or dictionary encoded data this limit can be hit before the page size? |
Ah I think you're thinking of the |
Hmm actually it looks like it shouldn't be specific to the Arrow API, I'll check what's happening there. |
I had already started a discussion on the Parquet dev ML about this: https://lists.apache.org/thread/vsxmbvnx9gy5414cfo25mnwcj17h1xyp I do think we should revisit this default page size constant, even if in some cases other factors make it smaller. |
Thanks for your patience. Conbench analyzed the 4 benchmarking runs that have been run so far on PR commit 7d639fd. There were 50 benchmark results indicating a performance regression:
The full Conbench report has more details. |
There doesn't seem to be any related regression on the benchmarks. |
I looked a bit closer at the |
FTR, we have a |
Ok, I've opened #47030 |
I looked at benchmarking unencrypted int32 data with nulls and using a non-plain encoding (delta binary packed). Otherwise the data layout is the same as my previous tests. Making the decompression buffers temporary decreases the max RSS with the system allocator, which is a bit surprising to me. But there is a slight increase in RSS and time taken with mimalloc and jemalloc.
I also looked at the memory allocations with massif, and the peak heap size is exactly the same, and is still dominated by the decompression buffers. Although my comment about them being referenced by the record batches isn't correct, the page buffers are still held in memory by the column reader, and then batches of data are decoded incrementally to Arrow arrays. So I don't think there's much reason to make the decompression buffers temporary and performance is generally a bit better if only the decryption buffers are temporary. I'm going to revert this PR back to only change the decryption buffers. |
This reverts commit 7d639fd.
7d639fd
to
91477fb
Compare
I just realized that large |
It definitely makes sense to me. I think that's how the Rust and Java implementation use it. (also, shouldn't it be discussed in #47030 ?) |
Rationale for this change
Reduces memory usage required when reading wide, encrypted Parquet files.
What changes are included in this PR?
Changes
SerializedPageReader
so that it doesn't hold a decryption buffer but only allocates one as needed, so it can be freed after pages are decompressed.Are these changes tested?
This is only a performance improvement and doesn't change any behaviour so should be covered by existing tests.
The memory improvement has been verified manually (see #46971).
Are there any user-facing changes?
Are performance improvements considered user-facing?