Skip to content

Conversation

@alamb
Copy link
Contributor

@alamb alamb commented Jan 9, 2026

Which issue does this PR close?

Rationale for this change

I noticed on #9061 that there is non trivial overhead to create struct arrays. I am trying to improve make_array in parallel, but @tustvold had an even better idea in #9058 (comment)

My 2 cents is it would be better to move the codepaths relying on ArrayData over to using the typed arrays directly, this should not only cut down on allocations but unnecessary validation and dispatch overheads.

What changes are included in this PR?

Update the parquet StructArray reader (used for the top level RecordBatch) to directly construct StructArray rather than using ArrayData

Are these changes tested?

By existing CI

Benchmarks show a small repeatable improvement of a few percent. For example

arrow_reader_clickbench/async/Q10    1.00     12.7±0.35ms        ? ?/sec    1.02     12.9±0.44ms        ? ?/sec

I am pretty sure this is because the click bench dataset has more than 100 columns. Creating such a struct array requires cloning 100 ArrayData (one for each child) which each has a Vec. So this saves (at least) 100 allocations per batch

Are there any user-facing changes?

@github-actions github-actions bot added the parquet Changes to the parquet crate label Jan 9, 2026
@alamb alamb force-pushed the alamb/less_parquet_allocations branch from cdceb7e to 75ea40b Compare January 9, 2026 15:02
@alamb
Copy link
Contributor Author

alamb commented Jan 9, 2026

run benchmark arrow_reader arrow_reader_clickbench

@alamb alamb marked this pull request as ready for review January 9, 2026 15:03
@alamb
Copy link
Contributor Author

alamb commented Jan 9, 2026

I suspect this is the biggest offender in terms of overhead. The same thing can be done to the other readers, which I think will also reduce some overhead (a single allocation)

@alamb-ghbot
Copy link

🤖 ./gh_compare_arrow.sh gh_compare_arrow.sh Running
Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing alamb/less_parquet_allocations (75ea40b) to 96637fc diff
BENCH_NAME=arrow_reader
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental --bench arrow_reader
BENCH_FILTER=
BENCH_BRANCH_NAME=alamb_less_parquet_allocations
Results will be posted here when complete

@alamb alamb changed the title [Parquet] perf: Create StructArrays directly rather than use ArrayData [Parquet] perf: Create StructArrays directly rather than via ArrayData Jan 9, 2026
@alamb-ghbot
Copy link

🤖: Benchmark completed

Details

group                                alamb_less_parquet_allocations         main
-----                                ------------------------------         ----
arrow_reader_clickbench/async/Q1     1.00      2.3±0.07ms        ? ?/sec    1.02      2.3±0.09ms        ? ?/sec
arrow_reader_clickbench/async/Q10    1.00     12.8±0.42ms        ? ?/sec    1.05     13.3±0.76ms        ? ?/sec
arrow_reader_clickbench/async/Q11    1.00     14.6±0.55ms        ? ?/sec    1.02     14.9±0.57ms        ? ?/sec
arrow_reader_clickbench/async/Q12    1.00     25.9±0.58ms        ? ?/sec    1.02     26.4±0.70ms        ? ?/sec
arrow_reader_clickbench/async/Q13    1.02     31.5±0.79ms        ? ?/sec    1.00     30.9±0.52ms        ? ?/sec
arrow_reader_clickbench/async/Q14    1.01     28.8±0.65ms        ? ?/sec    1.00     28.6±0.59ms        ? ?/sec
arrow_reader_clickbench/async/Q19    1.00      5.2±0.08ms        ? ?/sec    1.03      5.4±0.22ms        ? ?/sec
arrow_reader_clickbench/async/Q20    1.21    149.2±1.53ms        ? ?/sec    1.00    123.3±0.69ms        ? ?/sec
arrow_reader_clickbench/async/Q21    1.08    170.0±1.53ms        ? ?/sec    1.00    157.7±2.81ms        ? ?/sec
arrow_reader_clickbench/async/Q22    1.00   308.4±36.14ms        ? ?/sec    1.01   312.9±12.38ms        ? ?/sec
arrow_reader_clickbench/async/Q23    1.01    410.2±4.15ms        ? ?/sec    1.00    408.1±3.52ms        ? ?/sec
arrow_reader_clickbench/async/Q24    1.00     34.6±0.94ms        ? ?/sec    1.02     35.1±0.71ms        ? ?/sec
arrow_reader_clickbench/async/Q27    1.01    101.2±0.65ms        ? ?/sec    1.00    100.5±0.75ms        ? ?/sec
arrow_reader_clickbench/async/Q28    1.01     99.2±1.48ms        ? ?/sec    1.00     98.5±0.63ms        ? ?/sec
arrow_reader_clickbench/async/Q30    1.00     31.1±0.99ms        ? ?/sec    1.00     31.0±0.63ms        ? ?/sec
arrow_reader_clickbench/async/Q36    1.00    109.9±0.99ms        ? ?/sec    1.00    110.4±4.08ms        ? ?/sec
arrow_reader_clickbench/async/Q37    1.00     86.0±0.74ms        ? ?/sec    1.00     85.8±0.76ms        ? ?/sec
arrow_reader_clickbench/async/Q38    1.00     33.3±0.64ms        ? ?/sec    1.00     33.2±0.53ms        ? ?/sec
arrow_reader_clickbench/async/Q39    1.00     46.1±0.46ms        ? ?/sec    1.01     46.4±0.59ms        ? ?/sec
arrow_reader_clickbench/async/Q40    1.00     27.3±0.69ms        ? ?/sec    1.02     27.8±0.73ms        ? ?/sec
arrow_reader_clickbench/async/Q41    1.00     22.5±0.52ms        ? ?/sec    1.00     22.6±0.41ms        ? ?/sec
arrow_reader_clickbench/async/Q42    1.00     11.0±0.20ms        ? ?/sec    1.01     11.1±0.16ms        ? ?/sec
arrow_reader_clickbench/sync/Q1      1.00      2.1±0.01ms        ? ?/sec    1.04      2.1±0.06ms        ? ?/sec
arrow_reader_clickbench/sync/Q10     1.00     10.0±0.18ms        ? ?/sec    1.00     10.0±0.23ms        ? ?/sec
arrow_reader_clickbench/sync/Q11     1.00     11.4±0.25ms        ? ?/sec    1.01     11.6±0.23ms        ? ?/sec
arrow_reader_clickbench/sync/Q12     1.07     36.9±2.00ms        ? ?/sec    1.00     34.4±0.61ms        ? ?/sec
arrow_reader_clickbench/sync/Q13     1.00     48.0±1.21ms        ? ?/sec    1.02     48.7±1.27ms        ? ?/sec
arrow_reader_clickbench/sync/Q14     1.00     44.2±1.11ms        ? ?/sec    1.02     45.0±1.08ms        ? ?/sec
arrow_reader_clickbench/sync/Q19     1.00      4.2±0.03ms        ? ?/sec    1.02      4.3±0.03ms        ? ?/sec
arrow_reader_clickbench/sync/Q20     1.00    178.7±1.50ms        ? ?/sec    1.00    178.4±1.57ms        ? ?/sec
arrow_reader_clickbench/sync/Q21     1.01    237.9±2.33ms        ? ?/sec    1.00    236.5±2.75ms        ? ?/sec
arrow_reader_clickbench/sync/Q22     1.00    478.2±4.24ms        ? ?/sec    1.01    484.9±4.94ms        ? ?/sec
arrow_reader_clickbench/sync/Q23     1.02   444.7±18.71ms        ? ?/sec    1.00   437.1±13.75ms        ? ?/sec
arrow_reader_clickbench/sync/Q24     1.00     46.4±0.90ms        ? ?/sec    1.01     46.9±0.79ms        ? ?/sec
arrow_reader_clickbench/sync/Q27     1.00    154.8±1.48ms        ? ?/sec    1.00    155.4±1.50ms        ? ?/sec
arrow_reader_clickbench/sync/Q28     1.00    149.8±1.36ms        ? ?/sec    1.01    151.7±2.33ms        ? ?/sec
arrow_reader_clickbench/sync/Q30     1.00     31.4±0.73ms        ? ?/sec    1.01     31.6±1.18ms        ? ?/sec
arrow_reader_clickbench/sync/Q36     1.00    154.4±2.27ms        ? ?/sec    1.02    156.8±1.59ms        ? ?/sec
arrow_reader_clickbench/sync/Q37     1.00     88.7±0.96ms        ? ?/sec    1.02     90.6±1.31ms        ? ?/sec
arrow_reader_clickbench/sync/Q38     1.00     29.4±0.64ms        ? ?/sec    1.01     29.7±0.81ms        ? ?/sec
arrow_reader_clickbench/sync/Q39     1.00     34.1±0.59ms        ? ?/sec    1.01     34.5±0.61ms        ? ?/sec
arrow_reader_clickbench/sync/Q40     1.00     26.1±0.99ms        ? ?/sec    1.02     26.6±0.60ms        ? ?/sec
arrow_reader_clickbench/sync/Q41     1.00     28.7±0.38ms        ? ?/sec    1.04     29.8±0.90ms        ? ?/sec
arrow_reader_clickbench/sync/Q42     1.00     12.4±0.09ms        ? ?/sec    1.04     12.8±0.21ms        ? ?/sec

@alamb-ghbot
Copy link

🤖: Benchmark completed

Details

group                                alamb_less_parquet_allocations         main
-----                                ------------------------------         ----
arrow_reader_clickbench/async/Q1     1.00      2.3±0.05ms        ? ?/sec    1.01      2.3±0.01ms        ? ?/sec
arrow_reader_clickbench/async/Q10    1.00     12.7±0.35ms        ? ?/sec    1.02     12.9±0.44ms        ? ?/sec
arrow_reader_clickbench/async/Q11    1.00     14.3±0.33ms        ? ?/sec    1.04     14.8±0.66ms        ? ?/sec
arrow_reader_clickbench/async/Q12    1.00     25.5±0.53ms        ? ?/sec    1.02     26.1±0.55ms        ? ?/sec
arrow_reader_clickbench/async/Q13    1.00     31.0±0.60ms        ? ?/sec    1.00     31.0±0.70ms        ? ?/sec
arrow_reader_clickbench/async/Q14    1.00     28.2±0.68ms        ? ?/sec    1.01     28.3±0.68ms        ? ?/sec
arrow_reader_clickbench/async/Q19    1.00      5.2±0.13ms        ? ?/sec    1.03      5.3±0.14ms        ? ?/sec
arrow_reader_clickbench/async/Q20    1.22    149.2±1.37ms        ? ?/sec    1.00    122.6±1.57ms        ? ?/sec
arrow_reader_clickbench/async/Q21    1.08    169.9±2.38ms        ? ?/sec    1.00    156.6±2.79ms        ? ?/sec
arrow_reader_clickbench/async/Q22    1.00    241.6±2.10ms        ? ?/sec    1.29   311.7±10.07ms        ? ?/sec
arrow_reader_clickbench/async/Q23    1.00    408.7±2.83ms        ? ?/sec    1.00    406.7±4.25ms        ? ?/sec
arrow_reader_clickbench/async/Q24    1.00     34.5±0.51ms        ? ?/sec    1.00     34.7±0.86ms        ? ?/sec
arrow_reader_clickbench/async/Q27    1.00    100.4±1.00ms        ? ?/sec    1.00    100.4±1.43ms        ? ?/sec
arrow_reader_clickbench/async/Q28    1.01     99.0±0.91ms        ? ?/sec    1.00     98.4±0.66ms        ? ?/sec
arrow_reader_clickbench/async/Q30    1.00     30.7±0.58ms        ? ?/sec    1.01     30.9±0.59ms        ? ?/sec
arrow_reader_clickbench/async/Q36    1.00    108.8±0.91ms        ? ?/sec    1.00    108.9±1.48ms        ? ?/sec
arrow_reader_clickbench/async/Q37    1.00     84.9±0.61ms        ? ?/sec    1.01     85.6±1.61ms        ? ?/sec
arrow_reader_clickbench/async/Q38    1.00     32.7±0.50ms        ? ?/sec    1.01     33.0±0.48ms        ? ?/sec
arrow_reader_clickbench/async/Q39    1.00     46.4±0.60ms        ? ?/sec    1.00     46.3±0.71ms        ? ?/sec
arrow_reader_clickbench/async/Q40    1.00     27.5±0.60ms        ? ?/sec    1.00     27.5±0.65ms        ? ?/sec
arrow_reader_clickbench/async/Q41    1.00     21.8±0.51ms        ? ?/sec    1.05     23.0±0.64ms        ? ?/sec
arrow_reader_clickbench/async/Q42    1.00     10.8±0.14ms        ? ?/sec    1.04     11.2±0.19ms        ? ?/sec
arrow_reader_clickbench/sync/Q1      1.00      2.1±0.01ms        ? ?/sec    1.03      2.1±0.01ms        ? ?/sec
arrow_reader_clickbench/sync/Q10     1.00      9.9±0.16ms        ? ?/sec    1.00      9.9±0.11ms        ? ?/sec
arrow_reader_clickbench/sync/Q11     1.00     11.4±0.17ms        ? ?/sec    1.00     11.4±0.25ms        ? ?/sec
arrow_reader_clickbench/sync/Q12     1.06     36.6±1.80ms        ? ?/sec    1.00     34.4±1.12ms        ? ?/sec
arrow_reader_clickbench/sync/Q13     1.00     47.9±0.97ms        ? ?/sec    1.00     47.8±1.07ms        ? ?/sec
arrow_reader_clickbench/sync/Q14     1.00     36.4±0.71ms        ? ?/sec    1.23     44.6±1.02ms        ? ?/sec
arrow_reader_clickbench/sync/Q19     1.00      4.2±0.03ms        ? ?/sec    1.03      4.3±0.04ms        ? ?/sec
arrow_reader_clickbench/sync/Q20     1.00    177.9±1.11ms        ? ?/sec    1.00    177.8±1.22ms        ? ?/sec
arrow_reader_clickbench/sync/Q21     1.01    237.8±2.19ms        ? ?/sec    1.00    235.0±3.23ms        ? ?/sec
arrow_reader_clickbench/sync/Q22     1.00    480.8±4.14ms        ? ?/sec    1.00    481.5±4.21ms        ? ?/sec
arrow_reader_clickbench/sync/Q23     1.02   442.3±17.97ms        ? ?/sec    1.00   433.4±12.76ms        ? ?/sec
arrow_reader_clickbench/sync/Q24     1.00     45.6±1.34ms        ? ?/sec    1.00     45.8±0.80ms        ? ?/sec
arrow_reader_clickbench/sync/Q27     1.00    154.3±1.96ms        ? ?/sec    1.01    155.2±2.01ms        ? ?/sec
arrow_reader_clickbench/sync/Q28     1.00    148.8±1.83ms        ? ?/sec    1.01    149.8±2.15ms        ? ?/sec
arrow_reader_clickbench/sync/Q30     1.00     30.9±0.55ms        ? ?/sec    1.02     31.5±0.97ms        ? ?/sec
arrow_reader_clickbench/sync/Q36     1.00    153.3±1.69ms        ? ?/sec    1.01    154.1±1.67ms        ? ?/sec
arrow_reader_clickbench/sync/Q37     1.00     88.4±0.86ms        ? ?/sec    1.01     89.3±1.68ms        ? ?/sec
arrow_reader_clickbench/sync/Q38     1.00     29.1±0.49ms        ? ?/sec    1.02     29.6±0.39ms        ? ?/sec
arrow_reader_clickbench/sync/Q39     1.00     34.0±0.57ms        ? ?/sec    1.00     33.9±0.46ms        ? ?/sec
arrow_reader_clickbench/sync/Q40     1.00     25.9±0.37ms        ? ?/sec    1.00     26.0±0.28ms        ? ?/sec
arrow_reader_clickbench/sync/Q41     1.02     29.3±0.60ms        ? ?/sec    1.00     28.7±0.37ms        ? ?/sec
arrow_reader_clickbench/sync/Q42     1.00     12.3±0.12ms        ? ?/sec    1.03     12.7±0.07ms        ? ?/sec

@alamb
Copy link
Contributor Author

alamb commented Jan 9, 2026

my analysis of results is not a huge win but some improvement

@alamb
Copy link
Contributor Author

alamb commented Jan 9, 2026

The reason I think this will actually help ClickBench is that the ClickBench dataset has 105 columns.

I could probably make a benchmark that shows this helping for reading really wide tables

Read batch with 8192 rows and 105 columns

}

array_data_builder = array_data_builder.null_bit_buffer(Some(bitmap_builder.into()));
nulls = Some(NullBuffer::new(bitmap_builder.finish()));
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NullBuffer::new counts the set bits, but so do the existing code paths

.child_data(
children_array
.into_iter()
.map(|x| x.into_data())
Copy link
Contributor Author

@alamb alamb Jan 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

converting the child arrays into ArrayData is wasteful for at least 2 reasons:

  1. They are just converted back to ArrayRefs below
  2. Each child ArrayData has at least one new allocation (the Vec of buffers)

@apache apache deleted a comment from alamb-ghbot Jan 10, 2026
@apache apache deleted a comment from alamb-ghbot Jan 10, 2026
@apache apache deleted a comment from alamb-ghbot Jan 10, 2026
@apache apache deleted a comment from alamb-ghbot Jan 10, 2026
@apache apache deleted a comment from alamb-ghbot Jan 10, 2026
@alamb alamb changed the title [Parquet] perf: Create StructArrays directly rather than via ArrayData [Parquet] perf: Create StructArrays directly rather than via ArrayData (1% improvement) Jan 10, 2026
@Dandandan
Copy link
Contributor

arrow_reader_clickbench/async/Q20 1.21 149.2±1.53ms ? ?/sec 1.00 123.3±0.69ms ? ?/sec

This "regression" comes up twice in a row?

Copy link
Contributor

@scovich scovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I'm struggling to see how this change could have regressed performance on any benchmark?

@alamb
Copy link
Contributor Author

alamb commented Jan 11, 2026

arrow_reader_clickbench/async/Q20 1.21 149.2±1.53ms ? ?/sec 1.00 123.3±0.69ms ? ?/sec

This "regression" comes up twice in a row?

I was able to reproduce a smaller regression (about 1%) -- I'll see what I can find

cargo bench --features="arrow async" --bench arrow_reader_clickbench -- Q20

Here it is with git merge-base HEAD apache/main (aka where the branch diverged from main):

  • sync: 89.911 ms / async: 59.136ms

With this PR:

  • sync 90.376 ms / async: 60.336 ms
Details

arrow_reader_clickbench/sync/Q20
                        time:   [89.676 ms 89.911 ms 90.206 ms]
                        change: [−0.9129% −0.4274% +0.0319%] (p = 0.08 > 0.05)
                        No change in performance detected.
Found 9 outliers among 100 measurements (9.00%)
  3 (3.00%) high mild
  6 (6.00%) high severe

Benchmarking arrow_reader_clickbench/async/Q20: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 5.9s, or reduce sample count to 80.
arrow_reader_clickbench/async/Q20
                        time:   [59.034 ms 59.136 ms 59.246 ms]
                        change: [−0.3050% −0.0199% +0.2437%] (p = 0.89 > 0.05)
                        No change in performance detected.
Found 12 outliers among 100 measurements (12.00%)
  7 (7.00%) high mild
  5 (5.00%) high severe

(venv) andrewlamb@Andrews-MacBook-Pro-3:~/Software/arrow-rs$

Here it is on this branch

arrow_reader_clickbench/sync/Q20
                        time:   [90.004 ms 90.376 ms 90.785 ms]
                        change: [+0.0164% +0.5169% +1.0574%] (p = 0.06 > 0.05)
                        No change in performance detected.
Found 14 outliers among 100 measurements (14.00%)
  6 (6.00%) high mild
  8 (8.00%) high severe

Benchmarking arrow_reader_clickbench/async/Q20: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 6.0s, or reduce sample count to 80.
arrow_reader_clickbench/async/Q20
                        time:   [60.147 ms 60.336 ms 60.542 ms]
                        change: [+1.6210% +2.0286% +2.4391%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) high mild
  1 (1.00%) high severe

@github-actions github-actions bot added the arrow Changes to the arrow crate label Jan 11, 2026
@alamb
Copy link
Contributor Author

alamb commented Jan 11, 2026

It seems like BooleanBufferBuilder.finish() has some small overhead (make a new MutableBuffer) which wasn't done previously so I updated the code to avoid doing that in d8426fa

I'll rerun the benchmarks and hopefully we'll see an improvement

@alamb
Copy link
Contributor Author

alamb commented Jan 11, 2026

run benchmark arrow_reader_clickbench

@alamb-ghbot
Copy link

🤖 ./gh_compare_arrow.sh gh_compare_arrow.sh Running
Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing alamb/less_parquet_allocations (6c3f6e8) to 601be25 diff
BENCH_NAME=arrow_reader_clickbench
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental --bench arrow_reader_clickbench
BENCH_FILTER=
BENCH_BRANCH_NAME=alamb_less_parquet_allocations
Results will be posted here when complete

@alamb-ghbot
Copy link

🤖: Benchmark completed

Details

group                                alamb_less_parquet_allocations         main
-----                                ------------------------------         ----
arrow_reader_clickbench/async/Q1     1.00      2.3±0.04ms        ? ?/sec    1.02      2.3±0.04ms        ? ?/sec
arrow_reader_clickbench/async/Q10    1.00     13.2±0.25ms        ? ?/sec    1.02     13.5±0.26ms        ? ?/sec
arrow_reader_clickbench/async/Q11    1.00     14.9±0.36ms        ? ?/sec    1.02     15.3±0.26ms        ? ?/sec
arrow_reader_clickbench/async/Q12    1.00     25.3±0.38ms        ? ?/sec    1.04     26.2±0.39ms        ? ?/sec
arrow_reader_clickbench/async/Q13    1.00     30.6±0.47ms        ? ?/sec    1.03     31.4±0.53ms        ? ?/sec
arrow_reader_clickbench/async/Q14    1.00     27.7±0.30ms        ? ?/sec    1.05     29.0±0.52ms        ? ?/sec
arrow_reader_clickbench/async/Q19    1.00      5.3±0.06ms        ? ?/sec    1.01      5.3±0.08ms        ? ?/sec
arrow_reader_clickbench/async/Q20    1.00    113.6±0.52ms        ? ?/sec    1.00    113.9±0.93ms        ? ?/sec
arrow_reader_clickbench/async/Q21    1.16    153.1±2.72ms        ? ?/sec    1.00    132.1±0.84ms        ? ?/sec
arrow_reader_clickbench/async/Q22    1.08   301.0±11.00ms        ? ?/sec    1.00    279.1±6.36ms        ? ?/sec
arrow_reader_clickbench/async/Q23    1.00    402.8±2.59ms        ? ?/sec    1.00    404.3±4.62ms        ? ?/sec
arrow_reader_clickbench/async/Q24    1.00     33.7±0.35ms        ? ?/sec    1.03     34.8±0.63ms        ? ?/sec
arrow_reader_clickbench/async/Q27    1.00     98.3±0.82ms        ? ?/sec    1.00     98.8±0.72ms        ? ?/sec
arrow_reader_clickbench/async/Q28    1.00     97.1±0.36ms        ? ?/sec    1.01     97.8±0.68ms        ? ?/sec
arrow_reader_clickbench/async/Q30    1.00     30.1±0.54ms        ? ?/sec    1.04     31.3±0.74ms        ? ?/sec
arrow_reader_clickbench/async/Q36    1.00    106.7±0.81ms        ? ?/sec    1.01    107.6±1.00ms        ? ?/sec
arrow_reader_clickbench/async/Q37    1.00     83.5±0.51ms        ? ?/sec    1.01     84.4±1.16ms        ? ?/sec
arrow_reader_clickbench/async/Q38    1.00     32.4±0.51ms        ? ?/sec    1.02     32.9±0.45ms        ? ?/sec
arrow_reader_clickbench/async/Q39    1.00     45.6±0.74ms        ? ?/sec    1.01     46.0±0.28ms        ? ?/sec
arrow_reader_clickbench/async/Q40    1.00     26.8±0.34ms        ? ?/sec    1.02     27.2±0.60ms        ? ?/sec
arrow_reader_clickbench/async/Q41    1.00     21.6±0.22ms        ? ?/sec    1.05     22.7±0.56ms        ? ?/sec
arrow_reader_clickbench/async/Q42    1.00     10.8±0.13ms        ? ?/sec    1.04     11.2±0.37ms        ? ?/sec
arrow_reader_clickbench/sync/Q1      1.00      2.1±0.03ms        ? ?/sec    1.03      2.1±0.04ms        ? ?/sec
arrow_reader_clickbench/sync/Q10     1.00     10.1±0.05ms        ? ?/sec    1.02     10.2±0.24ms        ? ?/sec
arrow_reader_clickbench/sync/Q11     1.00     11.7±0.12ms        ? ?/sec    1.01     11.8±0.21ms        ? ?/sec
arrow_reader_clickbench/sync/Q12     1.00     35.6±1.80ms        ? ?/sec    1.07     38.1±0.39ms        ? ?/sec
arrow_reader_clickbench/sync/Q13     1.00     46.6±1.02ms        ? ?/sec    1.03     47.7±0.72ms        ? ?/sec
arrow_reader_clickbench/sync/Q14     1.15     41.5±0.86ms        ? ?/sec    1.00     36.1±0.32ms        ? ?/sec
arrow_reader_clickbench/sync/Q19     1.00      4.2±0.06ms        ? ?/sec    1.02      4.3±0.03ms        ? ?/sec
arrow_reader_clickbench/sync/Q20     1.00    174.9±1.25ms        ? ?/sec    1.00    174.3±0.71ms        ? ?/sec
arrow_reader_clickbench/sync/Q21     1.00    232.8±1.24ms        ? ?/sec    1.00    232.6±3.05ms        ? ?/sec
arrow_reader_clickbench/sync/Q22     1.00    470.7±3.24ms        ? ?/sec    1.00    469.4±3.54ms        ? ?/sec
arrow_reader_clickbench/sync/Q23     1.00   436.4±17.10ms        ? ?/sec    1.01   439.7±14.07ms        ? ?/sec
arrow_reader_clickbench/sync/Q24     1.00     43.6±1.03ms        ? ?/sec    1.04     45.2±0.51ms        ? ?/sec
arrow_reader_clickbench/sync/Q27     1.00    150.3±1.29ms        ? ?/sec    1.01    151.2±1.23ms        ? ?/sec
arrow_reader_clickbench/sync/Q28     1.00    146.7±1.47ms        ? ?/sec    1.00    146.3±0.87ms        ? ?/sec
arrow_reader_clickbench/sync/Q30     1.00     30.6±0.45ms        ? ?/sec    1.03     31.4±0.75ms        ? ?/sec
arrow_reader_clickbench/sync/Q36     1.00    150.7±1.07ms        ? ?/sec    1.00    151.3±1.39ms        ? ?/sec
arrow_reader_clickbench/sync/Q37     1.00     87.2±0.77ms        ? ?/sec    1.01     88.2±0.93ms        ? ?/sec
arrow_reader_clickbench/sync/Q38     1.00     28.7±0.76ms        ? ?/sec    1.02     29.2±0.62ms        ? ?/sec
arrow_reader_clickbench/sync/Q39     1.00     33.1±0.32ms        ? ?/sec    1.02     33.7±0.37ms        ? ?/sec
arrow_reader_clickbench/sync/Q40     1.00     25.9±0.29ms        ? ?/sec    1.03     26.6±0.21ms        ? ?/sec
arrow_reader_clickbench/sync/Q41     1.00     28.7±0.46ms        ? ?/sec    1.01     29.1±0.68ms        ? ?/sec
arrow_reader_clickbench/sync/Q42     1.00     12.4±0.23ms        ? ?/sec    1.02     12.7±0.17ms        ? ?/sec

@Dandandan
Copy link
Contributor

run benchmark arrow_reader_clickbench

@alamb-ghbot
Copy link

🤖 ./gh_compare_arrow.sh gh_compare_arrow.sh Running
Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing alamb/less_parquet_allocations (caf0f2c) to 601be25 diff
BENCH_NAME=arrow_reader_clickbench
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental --bench arrow_reader_clickbench
BENCH_FILTER=
BENCH_BRANCH_NAME=alamb_less_parquet_allocations
Results will be posted here when complete

@alamb-ghbot
Copy link

🤖: Benchmark completed

Details

group                                alamb_less_parquet_allocations         main
-----                                ------------------------------         ----
arrow_reader_clickbench/async/Q1     1.00      2.3±0.03ms        ? ?/sec    1.04      2.4±0.05ms        ? ?/sec
arrow_reader_clickbench/async/Q10    1.00     13.0±0.32ms        ? ?/sec    1.03     13.4±0.25ms        ? ?/sec
arrow_reader_clickbench/async/Q11    1.00     14.8±0.29ms        ? ?/sec    1.02     15.1±0.22ms        ? ?/sec
arrow_reader_clickbench/async/Q12    1.00     25.1±0.24ms        ? ?/sec    1.04     26.0±0.75ms        ? ?/sec
arrow_reader_clickbench/async/Q13    1.00     30.4±0.53ms        ? ?/sec    1.03     31.2±0.38ms        ? ?/sec
arrow_reader_clickbench/async/Q14    1.00     27.5±0.33ms        ? ?/sec    1.04     28.6±0.36ms        ? ?/sec
arrow_reader_clickbench/async/Q19    1.00      5.2±0.06ms        ? ?/sec    1.03      5.3±0.08ms        ? ?/sec
arrow_reader_clickbench/async/Q20    1.00    113.5±1.19ms        ? ?/sec    1.01    114.1±0.97ms        ? ?/sec
arrow_reader_clickbench/async/Q21    1.15    152.5±2.36ms        ? ?/sec    1.00    132.2±0.59ms        ? ?/sec
arrow_reader_clickbench/async/Q22    1.08    302.5±9.13ms        ? ?/sec    1.00    279.1±9.12ms        ? ?/sec
arrow_reader_clickbench/async/Q23    1.00    402.9±3.28ms        ? ?/sec    1.01    405.9±1.97ms        ? ?/sec
arrow_reader_clickbench/async/Q24    1.00     33.7±0.48ms        ? ?/sec    1.02     34.4±0.37ms        ? ?/sec
arrow_reader_clickbench/async/Q27    1.00     98.6±1.03ms        ? ?/sec    1.01     99.8±0.90ms        ? ?/sec
arrow_reader_clickbench/async/Q28    1.00     96.9±0.51ms        ? ?/sec    1.01     98.2±1.22ms        ? ?/sec
arrow_reader_clickbench/async/Q30    1.00     30.5±0.32ms        ? ?/sec    1.02     31.0±0.38ms        ? ?/sec
arrow_reader_clickbench/async/Q36    1.00    106.9±1.12ms        ? ?/sec    1.01    107.8±0.57ms        ? ?/sec
arrow_reader_clickbench/async/Q37    1.00     83.5±0.68ms        ? ?/sec    1.01     84.4±0.60ms        ? ?/sec
arrow_reader_clickbench/async/Q38    1.00     32.6±0.57ms        ? ?/sec    1.01     32.8±0.38ms        ? ?/sec
arrow_reader_clickbench/async/Q39    1.00     45.6±0.41ms        ? ?/sec    1.01     45.8±0.78ms        ? ?/sec
arrow_reader_clickbench/async/Q40    1.00     27.2±0.99ms        ? ?/sec    1.01     27.4±0.27ms        ? ?/sec
arrow_reader_clickbench/async/Q41    1.00     21.6±0.24ms        ? ?/sec    1.03     22.3±0.36ms        ? ?/sec
arrow_reader_clickbench/async/Q42    1.00     10.8±0.11ms        ? ?/sec    1.03     11.1±0.08ms        ? ?/sec
arrow_reader_clickbench/sync/Q1      1.00      2.1±0.04ms        ? ?/sec    1.03      2.1±0.01ms        ? ?/sec
arrow_reader_clickbench/sync/Q10     1.00     10.0±0.07ms        ? ?/sec    1.02     10.2±0.25ms        ? ?/sec
arrow_reader_clickbench/sync/Q11     1.00     11.6±0.13ms        ? ?/sec    1.02     11.8±0.14ms        ? ?/sec
arrow_reader_clickbench/sync/Q12     1.00     35.5±1.97ms        ? ?/sec    1.07     37.9±0.45ms        ? ?/sec
arrow_reader_clickbench/sync/Q13     1.00     46.5±0.77ms        ? ?/sec    1.03     47.6±1.08ms        ? ?/sec
arrow_reader_clickbench/sync/Q14     1.14     41.2±0.44ms        ? ?/sec    1.00     36.2±0.82ms        ? ?/sec
arrow_reader_clickbench/sync/Q19     1.00      4.2±0.02ms        ? ?/sec    1.03      4.3±0.03ms        ? ?/sec
arrow_reader_clickbench/sync/Q20     1.00    174.7±0.82ms        ? ?/sec    1.00    175.1±1.25ms        ? ?/sec
arrow_reader_clickbench/sync/Q21     1.00    233.0±2.32ms        ? ?/sec    1.00    232.9±2.87ms        ? ?/sec
arrow_reader_clickbench/sync/Q22     1.00    471.1±3.94ms        ? ?/sec    1.00    470.4±3.70ms        ? ?/sec
arrow_reader_clickbench/sync/Q23     1.00   435.9±17.34ms        ? ?/sec    1.01   439.4±14.71ms        ? ?/sec
arrow_reader_clickbench/sync/Q24     1.00     42.5±0.57ms        ? ?/sec    1.07     45.3±1.06ms        ? ?/sec
arrow_reader_clickbench/sync/Q27     1.00    149.7±0.94ms        ? ?/sec    1.01    150.9±1.20ms        ? ?/sec
arrow_reader_clickbench/sync/Q28     1.00    146.0±1.35ms        ? ?/sec    1.00    145.8±1.36ms        ? ?/sec
arrow_reader_clickbench/sync/Q30     1.00     30.3±0.30ms        ? ?/sec    1.03     31.4±0.50ms        ? ?/sec
arrow_reader_clickbench/sync/Q36     1.00    150.3±1.51ms        ? ?/sec    1.00    150.8±2.04ms        ? ?/sec
arrow_reader_clickbench/sync/Q37     1.00     87.0±1.09ms        ? ?/sec    1.01     87.6±0.89ms        ? ?/sec
arrow_reader_clickbench/sync/Q38     1.00     28.6±0.68ms        ? ?/sec    1.01     28.9±0.36ms        ? ?/sec
arrow_reader_clickbench/sync/Q39     1.00     33.0±0.40ms        ? ?/sec    1.01     33.2±0.32ms        ? ?/sec
arrow_reader_clickbench/sync/Q40     1.00     25.8±0.39ms        ? ?/sec    1.02     26.5±0.30ms        ? ?/sec
arrow_reader_clickbench/sync/Q41     1.00     28.5±0.50ms        ? ?/sec    1.01     28.7±0.25ms        ? ?/sec
arrow_reader_clickbench/sync/Q42     1.00     12.3±0.08ms        ? ?/sec    1.02     12.6±0.07ms        ? ?/sec

@Dandandan
Copy link
Contributor

Now q14/q21 show a reproducable slowdown on the VM

@alamb
Copy link
Contributor Author

alamb commented Jan 12, 2026

Now q14/q21 show a reproducable slowdown on the VM

🤔 given my experience with allocation related performance in the boolean kernels, e,g what lead to

I think we should conclude the difference is related to something related to the allocator state

I will take one more look at the code paths in involved as well and run the benchmarks again and locally

@alamb
Copy link
Contributor Author

alamb commented Jan 12, 2026

run benchmark arrow_reader_clickbench

@alamb-ghbot
Copy link

🤖 ./gh_compare_arrow.sh gh_compare_arrow.sh Running
Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing alamb/less_parquet_allocations (caf0f2c) to 601be25 diff
BENCH_NAME=arrow_reader_clickbench
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental --bench arrow_reader_clickbench
BENCH_FILTER=
BENCH_BRANCH_NAME=alamb_less_parquet_allocations
Results will be posted here when complete

@alamb
Copy link
Contributor Author

alamb commented Jan 12, 2026

Here are my measurements on my machine

cargo bench --features="arrow async" --bench  arrow_reader_clickbench -- Q21

Merge base

arrow_reader_clickbench/sync/Q21
                        time:   [119.27 ms 119.51 ms 119.77 ms]
Found 6 outliers among 100 measurements (6.00%)
  5 (5.00%) high mild
  1 (1.00%) high severe

Benchmarking arrow_reader_clickbench/async/Q21: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 6.9s, or reduce sample count to 70.
arrow_reader_clickbench/async/Q21
                        time:   [69.156 ms 69.354 ms 69.559 ms]
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild

This branch

arrow_reader_clickbench/sync/Q21
                        time:   [118.80 ms 119.31 ms 119.89 ms]
                        change: [−0.6606% −0.1639% +0.3667%] (p = 0.53 > 0.05)
                        No change in performance detected.
Found 21 outliers among 100 measurements (21.00%)
  6 (6.00%) high mild
  15 (15.00%) high severe

Benchmarking arrow_reader_clickbench/async/Q21: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 6.8s, or reduce sample count to 70.
arrow_reader_clickbench/async/Q21
                        time:   [68.356 ms 68.459 ms 68.577 ms]
                        change: [−1.6200% −1.2901% −0.9603%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 11 outliers among 100 measurements (11.00%)
  7 (7.00%) high mild
  4 (4.00%) high severe

@alamb
Copy link
Contributor Author

alamb commented Jan 12, 2026

Same measurements on Q21/Q14:

Merge base

arrow_reader_clickbench/sync/Q14
                        time:   [23.212 ms 23.847 ms 24.528 ms]
Found 16 outliers among 100 measurements (16.00%)
  16 (16.00%) high severe

arrow_reader_clickbench/async/Q14
                        time:   [14.286 ms 14.316 ms 14.349 ms]
Found 12 outliers among 100 measurements (12.00%)
  7 (7.00%) high mild
  5 (5.00%) high severe

this branch

arrow_reader_clickbench/sync/Q14
                        time:   [23.287 ms 23.937 ms 24.636 ms]
                        change: [−3.5557% +0.3793% +4.4540%] (p = 0.85 > 0.05)
                        No change in performance detected.
Found 21 outliers among 100 measurements (21.00%)
  2 (2.00%) high mild
  19 (19.00%) high severe

arrow_reader_clickbench/async/Q14
                        time:   [14.420 ms 14.464 ms 14.515 ms]
                        change: [+0.6410% +1.0344% +1.4332%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 13 outliers among 100 measurements (13.00%)
  5 (5.00%) high mild
  8 (8.00%) high severe

@alamb-ghbot
Copy link

🤖: Benchmark completed

Details

group                                alamb_less_parquet_allocations         main
-----                                ------------------------------         ----
arrow_reader_clickbench/async/Q1     1.00      2.3±0.04ms        ? ?/sec    1.02      2.3±0.02ms        ? ?/sec
arrow_reader_clickbench/async/Q10    1.00     13.3±0.29ms        ? ?/sec    1.00     13.3±0.66ms        ? ?/sec
arrow_reader_clickbench/async/Q11    1.01     15.1±0.50ms        ? ?/sec    1.00     14.9±0.58ms        ? ?/sec
arrow_reader_clickbench/async/Q12    1.00     25.8±0.54ms        ? ?/sec    1.02     26.3±0.81ms        ? ?/sec
arrow_reader_clickbench/async/Q13    1.00     31.2±0.61ms        ? ?/sec    1.03     32.1±1.08ms        ? ?/sec
arrow_reader_clickbench/async/Q14    1.00     28.2±0.65ms        ? ?/sec    1.01     28.6±0.56ms        ? ?/sec
arrow_reader_clickbench/async/Q19    1.00      5.4±0.17ms        ? ?/sec    1.02      5.5±0.21ms        ? ?/sec
arrow_reader_clickbench/async/Q20    1.11    128.9±0.76ms        ? ?/sec    1.00    116.3±1.02ms        ? ?/sec
arrow_reader_clickbench/async/Q21    1.08    165.2±1.73ms        ? ?/sec    1.00    153.2±1.98ms        ? ?/sec
arrow_reader_clickbench/async/Q22    1.11   327.3±16.53ms        ? ?/sec    1.00   295.2±13.35ms        ? ?/sec
arrow_reader_clickbench/async/Q23    1.01    413.2±5.60ms        ? ?/sec    1.00    407.3±2.16ms        ? ?/sec
arrow_reader_clickbench/async/Q24    1.05     36.1±0.84ms        ? ?/sec    1.00     34.3±0.46ms        ? ?/sec
arrow_reader_clickbench/async/Q27    1.00    100.9±1.01ms        ? ?/sec    1.00    100.9±0.89ms        ? ?/sec
arrow_reader_clickbench/async/Q28    1.00     98.1±1.01ms        ? ?/sec    1.01     99.5±0.46ms        ? ?/sec
arrow_reader_clickbench/async/Q30    1.07     33.0±1.15ms        ? ?/sec    1.00     30.9±0.28ms        ? ?/sec
arrow_reader_clickbench/async/Q36    1.00    108.6±1.10ms        ? ?/sec    1.01    109.6±0.92ms        ? ?/sec
arrow_reader_clickbench/async/Q37    1.01     86.3±0.94ms        ? ?/sec    1.00     85.5±0.58ms        ? ?/sec
arrow_reader_clickbench/async/Q38    1.05     34.6±0.60ms        ? ?/sec    1.00     33.0±0.34ms        ? ?/sec
arrow_reader_clickbench/async/Q39    1.05     47.9±0.63ms        ? ?/sec    1.00     45.8±0.37ms        ? ?/sec
arrow_reader_clickbench/async/Q40    1.13     30.6±0.58ms        ? ?/sec    1.00     27.1±0.41ms        ? ?/sec
arrow_reader_clickbench/async/Q41    1.11     24.7±0.58ms        ? ?/sec    1.00     22.2±0.21ms        ? ?/sec
arrow_reader_clickbench/async/Q42    1.04     11.5±0.19ms        ? ?/sec    1.00     11.0±0.37ms        ? ?/sec
arrow_reader_clickbench/sync/Q1      1.00      2.1±0.04ms        ? ?/sec    1.02      2.1±0.04ms        ? ?/sec
arrow_reader_clickbench/sync/Q10     1.00     10.0±0.24ms        ? ?/sec    1.00     10.0±0.20ms        ? ?/sec
arrow_reader_clickbench/sync/Q11     1.00     11.4±0.13ms        ? ?/sec    1.02     11.6±0.11ms        ? ?/sec
arrow_reader_clickbench/sync/Q12     1.00     34.9±2.36ms        ? ?/sec    1.01     35.4±2.13ms        ? ?/sec
arrow_reader_clickbench/sync/Q13     1.00     47.0±0.88ms        ? ?/sec    1.05     49.2±0.90ms        ? ?/sec
arrow_reader_clickbench/sync/Q14     1.15     42.2±0.62ms        ? ?/sec    1.00     36.8±0.51ms        ? ?/sec
arrow_reader_clickbench/sync/Q19     1.00      4.2±0.06ms        ? ?/sec    1.03      4.3±0.05ms        ? ?/sec
arrow_reader_clickbench/sync/Q20     1.00    172.5±1.56ms        ? ?/sec    1.04    179.0±2.45ms        ? ?/sec
arrow_reader_clickbench/sync/Q21     1.00    230.3±1.27ms        ? ?/sec    1.03    237.4±2.26ms        ? ?/sec
arrow_reader_clickbench/sync/Q22     1.00    465.2±2.60ms        ? ?/sec    1.03    481.5±3.84ms        ? ?/sec
arrow_reader_clickbench/sync/Q23     1.00   434.5±17.46ms        ? ?/sec    1.02   445.2±12.47ms        ? ?/sec
arrow_reader_clickbench/sync/Q24     1.00     44.2±0.61ms        ? ?/sec    1.06     46.7±0.69ms        ? ?/sec
arrow_reader_clickbench/sync/Q27     1.00    149.5±1.26ms        ? ?/sec    1.05    156.3±1.74ms        ? ?/sec
arrow_reader_clickbench/sync/Q28     1.00    144.9±0.96ms        ? ?/sec    1.04    150.1±1.47ms        ? ?/sec
arrow_reader_clickbench/sync/Q30     1.00     31.1±0.60ms        ? ?/sec    1.02     31.7±0.74ms        ? ?/sec
arrow_reader_clickbench/sync/Q36     1.00    150.6±1.01ms        ? ?/sec    1.03    155.1±1.41ms        ? ?/sec
arrow_reader_clickbench/sync/Q37     1.00     88.2±1.27ms        ? ?/sec    1.02     89.7±1.03ms        ? ?/sec
arrow_reader_clickbench/sync/Q38     1.00     29.3±0.67ms        ? ?/sec    1.01     29.7±0.56ms        ? ?/sec
arrow_reader_clickbench/sync/Q39     1.00     33.5±0.62ms        ? ?/sec    1.02     34.2±0.74ms        ? ?/sec
arrow_reader_clickbench/sync/Q40     1.00     26.2±0.37ms        ? ?/sec    1.01     26.6±0.44ms        ? ?/sec
arrow_reader_clickbench/sync/Q41     1.00     29.2±0.37ms        ? ?/sec    1.00     29.3±0.49ms        ? ?/sec
arrow_reader_clickbench/sync/Q42     1.00     12.5±0.13ms        ? ?/sec    1.01     12.6±0.11ms        ? ?/sec

@alamb
Copy link
Contributor Author

alamb commented Jan 12, 2026

I re-reviewed the code paths involved (and had codex do the same) and we found no additional allocations or work, so I am going to claim this is an improvement via inspection and discount the measurements

Thank you @scovich and @Dandandan for the reviews

@scovich
Copy link
Contributor

scovich commented Jan 12, 2026

It seems like BooleanBufferBuilder.finish() has some small overhead (make a new MutableBuffer) which wasn't done previously so I updated the code to avoid doing that in d8426fa

This seems... odd. A new impl From<BooleanBufferBuilder> for NullBuffer would seem to overlap strongly with BooleanBufferBuilder::finish. Tho actually, impl From<BooleanBufferBuilder> for BooleanBuffer is the weird one. What is finish required to do that impl From (producing the same type!) does not need to do?

@scovich
Copy link
Contributor

scovich commented Jan 12, 2026

It seems like BooleanBufferBuilder.finish() has some small overhead (make a new MutableBuffer) which wasn't done previously so I updated the code to avoid doing that in d8426fa

This seems... odd. A new impl From<BooleanBufferBuilder> for NullBuffer would seem to overlap strongly with BooleanBufferBuilder::finish. Tho actually, impl From<BooleanBufferBuilder> for BooleanBuffer is the weird one. What is finish required to do that impl From (producing the same type!) does not need to do?

Oh... BooleanBufferBuilder::finish takes &mut self and must allocate internal replacements for the consumed state, while From (as always) consumes self. Maybe it's worth documenting that pitfall on the finish method? I didn't even realize that impl From was a cheap/final alternative to finish, but it does make sense in retrospect.

@scovich
Copy link
Contributor

scovich commented Jan 12, 2026

BooleanBufferBuilder::finish takes &mut self and must allocate internal replacements for the consumed state, while From (as always) consumes self. Maybe it's worth documenting that pitfall on the finish method? I didn't even realize that impl From was a cheap/final alternative to finish, but it does make sense in retrospect.

Maybe it's worth considering an alternative approach to the builder construction/finish protocol? Instead of allocating capacity on creation (and re-allocating on finish), allocate on first touch. Every append to a builder anyway has to do capacity checks, so the compiler should inline and optimize the branching so that the first-touch check is basically free in the common case where the allocation is already large enough?

But looking at the code, MutableBuffer::new(0) calls MutableBuffer::with_capacity which doesn't actually allocate any memory? Is the mutex allocation somehow expensive?

    pub fn with_capacity(capacity: usize) -> Self {
        let capacity = bit_util::round_upto_multiple_of_64(capacity);
        let layout = Layout::from_size_align(capacity, ALIGNMENT)
            .expect("failed to create layout for MutableBuffer");
        let data = match layout.size() {
            0 => dangling_ptr(),
            _ => {
                // Safety: Verified size != 0
                let raw_ptr = unsafe { std::alloc::alloc(layout) };
                NonNull::new(raw_ptr).unwrap_or_else(|| handle_alloc_error(layout))
            }
        };
        Self {
            data,
            len: 0,
            layout,
            #[cfg(feature = "pool")]
            reservation: std::sync::Mutex::new(None),
        }
    }

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

arrow Changes to the arrow crate parquet Changes to the parquet crate performance

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants