GH-46971: [C++][Parquet] Use temporary buffers when decrypting Parquet data pages #46972

adamreeve · 2025-07-02T04:28:16Z

Rationale for this change

Reduces memory usage required when reading wide, encrypted Parquet files.

What changes are included in this PR?

Changes SerializedPageReader so that it doesn't hold a decryption buffer but only allocates one as needed, so it can be freed after pages are decompressed.

Are these changes tested?

This is only a performance improvement and doesn't change any behaviour so should be covered by existing tests.
The memory improvement has been verified manually (see #46971).

Are there any user-facing changes?

Are performance improvements considered user-facing?

GitHub Issue: [C++][Parquet] Reduce memory usage required by decryption buffers when reading encrypted Parquet #46971

github-actions · 2025-07-02T04:28:40Z

⚠️ GitHub issue #46971 has been automatically assigned in GitHub to PR creator.

pitrou · 2025-07-02T08:01:38Z

Do we want to do the same for the decompression buffer? It would also be easier to benchmark, as compressed Parquet files are much more common than encrypted Parquet files.

adamreeve · 2025-07-02T08:34:16Z

I was assuming the decompression buffers remain referenced by the returned record batches so doing the same for those wouldn't help, but I haven't verified that that's true. I'll test that too.

pitrou · 2025-07-02T08:41:03Z

I was assuming the decompression buffers remain referenced by the returned record batches so doing the same for those wouldn't help, but I haven't verified that that's true.

Only in the zero-copy cases, which are quite limited (fixed-width type, PLAIN encoding, no nulls, no encryption). And even then, we probably don't always do zero-copy.

adamreeve · 2025-07-07T05:38:41Z

Right OK that makes sense. I'm testing with plain encoded float columns so hadn't noticed that but yes it might be a good idea to also change the decompression buffers then.

I did some rough benchmarks with /usr/bin/time -v and get the following results for my test case (what's described in #46971 but reading all row groups), taking the best out of three runs:

	System allocator	mimalloc	jemalloc
Baseline time (s)	6.92	6.89	6.68
Time with temp decrypt buffers (s)	8.23	6.00	6.42
Time with temp decrypt and decompress buffers (s)	6.50	6.08	6.59

	System allocator	mimalloc	jemalloc
Baseline max RSS (MB)	1,556	1,550	1,128
Max RSS with temp decrypt buffers (MB)	894	891	627
Max RSS with temp decrypt and decompress buffers (MB)	1,823	890	629

The behaviour with mimalloc and jemalloc looks good, but the results with the system allocator are quite concerning. The max RSS decreases if using temporary decryption buffers, but actually increases quite significantly when also using temporary decompression buffers. I'm not sure why that would be, maybe this causes more memory fragmentation? (C++ heap memory management is not something I know a lot about...). There is also a noticeable slow-down in the temporary decryption buffer case with the system allocator.

Maybe this is acceptable, given most users will be using mimalloc?

I also tested with unencrypted data:

	System allocator	mimalloc	jemalloc
Baseline time (s)	4.07	4.09	3.93
Time with temp decompress buffers (s)	4.08	4.25	3.98

	System allocator	mimalloc	jemalloc
Baseline max RSS (MB)	884	895	627
Max RSS with temp decompress buffers (MB)	954	913	660

Based on these benchmarks alone, maybe only the decryption buffers should be temporary. But I've only tested with plain float data. I'll look into testing with more data types and encodings.

pitrou · 2025-07-07T06:45:52Z

@ursabot please benchmark

ursabot · 2025-07-07T06:45:57Z

Benchmark runs are scheduled for commit 7d639fd. Watch https://buildkite.com/apache-arrow and https://conbench.ursa.dev for updates. A comment will be posted here when the runs are complete.

pitrou · 2025-07-07T08:08:23Z

By the way, how did you get or generate your test files? A 1 MiB page size sounds rather large.

adamreeve · 2025-07-07T09:15:18Z

I'm using ParquetSharp but this is a wrapper of the C++ Parquet library, and 1 MiB is the default there:

arrow/cpp/src/parquet/properties.h

Line 157 in 0b34e6b

static constexpr int64_t kDefaultDataPageSize = 1024 * 1024;

pitrou · 2025-07-07T09:22:19Z

I'm using ParquetSharp but this is a wrapper of the C++ Parquet library, and 1 MiB is the default there:

Hmm, I was under the impression that another factor limited the actual page size produced by Parquet C++, but I can't find again what it is. @wgtmac Could you enlighten me here?

wgtmac · 2025-07-07T09:43:32Z

@pitrou Did you mean CDC?

arrow/cpp/src/parquet/column_writer.cc

Line 1418 in 3ebe7ee

AddDataPage();

pitrou · 2025-07-07T10:23:17Z

@wgtmac No, I mean other parameters.

adamreeve · 2025-07-07T11:05:53Z

1024 * 1024 is also the default max row group size. Maybe for some integer or dictionary encoded data this limit can be hit before the page size?

adamreeve · 2025-07-07T11:14:09Z

Ah I think you're thinking of the write_batch_size parameter that's used by the Arrow API. This is a number of rows and defaults to 1024. I used the column writer based API rather than the Arrow API though.

adamreeve · 2025-07-07T11:18:10Z

Hmm actually it looks like it shouldn't be specific to the Arrow API, I'll check what's happening there.

pitrou · 2025-07-07T11:22:05Z

I had already started a discussion on the Parquet dev ML about this: https://lists.apache.org/thread/vsxmbvnx9gy5414cfo25mnwcj17h1xyp

I do think we should revisit this default page size constant, even if in some cases other factors make it smaller.

conbench-apache-arrow · 2025-07-07T13:21:12Z

Thanks for your patience. Conbench analyzed the 4 benchmarking runs that have been run so far on PR commit 7d639fd.

There were 50 benchmark results indicating a performance regression:

Pull Request Run on amd64-c6a-4xlarge-linux at 2025-07-07 07:47:30Z
- ArrayArrayKernel (C++) with params=<DivideChecked, UInt32Type>/size:524288/inverse_null_proportion:0, source=cpp-micro, suite=arrow-compute-scalar-arithmetic-benchmark
- BM_PlainEncodingSpacedBoolean (C++) with params=32768/10000, source=cpp-micro, suite=parquet-encoding-benchmark
and 48 more (see the report linked below)

The full Conbench report has more details.

pitrou · 2025-07-07T13:55:43Z

There doesn't seem to be any related regression on the benchmarks.
I've also run this PR locally on a couple Parquet files I have lying around, and could not see any concerning performance drop.

adamreeve · 2025-07-07T22:47:51Z

I looked a bit closer at the write_batch_size parameter. This doesn't actually control how many values are written to a page, but just how many are written at once before checking whether the page size has reached the configured page size limit and writing out the page. From that mailing list thread, it sounds like other implementations have a byte based limit and a separate row count limit, but it doesn't look like there's a row limit in the C++ implementation.

wgtmac · 2025-07-08T02:25:09Z

FTR, we have a max_row_group_length to control number of rows in a single row group.

pitrou · 2025-07-08T12:50:08Z

Ok, I've opened #47030

adamreeve · 2025-07-09T04:53:30Z

I looked at benchmarking unencrypted int32 data with nulls and using a non-plain encoding (delta binary packed). Otherwise the data layout is the same as my previous tests. Making the decompression buffers temporary decreases the max RSS with the system allocator, which is a bit surprising to me. But there is a slight increase in RSS and time taken with mimalloc and jemalloc.

	System allocator	mimalloc	jemalloc
Baseline time (s)	10.82	11.02	10.62
Time with temp decompress buffers (s)	10.93	11.53	10.96

	System allocator	mimalloc	jemalloc
Baseline max RSS (MB)	1,235	1,047	891
Max RSS with temp decompress buffers (MB)	1,065	1,085	902

I also looked at the memory allocations with massif, and the peak heap size is exactly the same, and is still dominated by the decompression buffers. Although my comment about them being referenced by the record batches isn't correct, the page buffers are still held in memory by the column reader, and then batches of data are decoded incrementally to Arrow arrays.

So I don't think there's much reason to make the decompression buffers temporary and performance is generally a bit better if only the decryption buffers are temporary. I'm going to revert this PR back to only change the decryption buffers.

This reverts commit 7d639fd.

wgtmac · 2025-07-10T14:41:15Z

Ah I think you're thinking of the write_batch_size parameter that's used by the Arrow API. This is a number of rows and defaults to 1024. I used the column writer based API rather than the Arrow API though.

I just realized that large properties_->write_batch_size() makes it difficult to precisely split data pages based on properties_->data_pagesize(). To implement #47030, we have to adjust batch size to satisfy the new properties_->max_rows_per_data_page(). Perhaps we need to slightly change the meaning of properties_->write_batch_size() to be the maximum number of values in a batch to write to a ColumnWriter. Does it make sense? @adamreeve @pitrou

pitrou · 2025-07-10T15:31:22Z

Perhaps we need to slightly change the meaning of properties_->write_batch_size() to be the maximum number of values in a batch to write to a ColumnWriter. Does it make sense? @adamreeve @pitrou

It definitely makes sense to me. I think that's how the Rust and Java implementation use it.

(also, shouldn't it be discussed in #47030 ?)

github-actions bot added Component: Parquet Component: C++ awaiting review Awaiting review labels Jul 2, 2025

adamreeve changed the title ~~GH-46971: [C++][Parquet] Use temporary decryption buffers in Parquet SerialiedPageReader~~ GH-46971: [C++][Parquet] Use temporary buffers when decrypting Parquet data pages Jul 2, 2025

adamreeve force-pushed the temp_decryption_buffers branch from a1a6dd7 to 2fb6895 Compare July 2, 2025 04:42

adamreeve force-pushed the temp_decryption_buffers branch from 2fb6895 to 7d639fd Compare July 7, 2025 03:59

adamreeve added 2 commits July 9, 2025 16:56

Use temporary decryption buffers in Parquet SerialiedPageReader

3b2b706

Also make the decompression buffer temporary

cca9f4d

adamreeve added 2 commits July 9, 2025 16:56

Revert "Also make the decompression buffer temporary"

28efee4

This reverts commit 7d639fd.

Fix comment

91477fb

adamreeve force-pushed the temp_decryption_buffers branch from 7d639fd to 91477fb Compare July 9, 2025 04:57

adamreeve marked this pull request as ready for review July 9, 2025 05:33

adamreeve requested a review from wgtmac as a code owner July 9, 2025 05:33

GH-46971: [C++][Parquet] Use temporary buffers when decrypting Parquet data pages #46972

Are you sure you want to change the base?

GH-46971: [C++][Parquet] Use temporary buffers when decrypting Parquet data pages #46972

Conversation

adamreeve commented Jul 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

github-actions bot commented Jul 2, 2025

Uh oh!

pitrou commented Jul 2, 2025

Uh oh!

adamreeve commented Jul 2, 2025

Uh oh!

pitrou commented Jul 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

adamreeve commented Jul 7, 2025

Uh oh!

pitrou commented Jul 7, 2025

Uh oh!

ursabot commented Jul 7, 2025

Uh oh!

pitrou commented Jul 7, 2025

Uh oh!

adamreeve commented Jul 7, 2025

Uh oh!

pitrou commented Jul 7, 2025

Uh oh!

wgtmac commented Jul 7, 2025

Uh oh!

pitrou commented Jul 7, 2025

Uh oh!

adamreeve commented Jul 7, 2025

Uh oh!

adamreeve commented Jul 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

adamreeve commented Jul 7, 2025

Uh oh!

pitrou commented Jul 7, 2025

Uh oh!

conbench-apache-arrow bot commented Jul 7, 2025

Uh oh!

pitrou commented Jul 7, 2025

Uh oh!

adamreeve commented Jul 7, 2025

Uh oh!

wgtmac commented Jul 8, 2025

Uh oh!

pitrou commented Jul 8, 2025

Uh oh!

adamreeve commented Jul 9, 2025

Uh oh!

wgtmac commented Jul 10, 2025

Uh oh!

pitrou commented Jul 10, 2025

Uh oh!

Uh oh!

adamreeve commented Jul 2, 2025 •

edited

Loading

pitrou commented Jul 2, 2025 •

edited

Loading

adamreeve commented Jul 7, 2025 •

edited

Loading