perf: pre-allocate Vecs and HashSets when size is known #1676

fvaleye · 2026-01-24T15:48:44Z

What changes are proposed in this pull request?

This PR improves memory allocation efficiency by pre-allocating Vec and HashSet collections when the size is known or can be reasonably estimated. This avoids repeated reallocations as collections grow.

How was this change tested?

Existing test suite passes
Changes are allocation-only optimizations with no behavioral changes

Micro-benchmark shows 1.5x-4x speedup for the allocation pattern alone (10-1000 rows group). Real-world improvement is smaller since predicate evaluation dominates.

Local benchmark shows 2-4x speedup for RowIndexBuilder initialization.

Local benchmark: 1.3-1.5x speedup for list/map materialization

…build Local benchmark shows 1.8-3x speedup for ordinal deduplication

Local benchmark: 2-5x speedup for column extraction setup.

…ession Local benchmark: 1.1-1.3x speedup

…pend_columns

…dation

codecov · 2026-01-24T15:54:39Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 84.64%. Comparing base (bbaef1a) to head (f51d98b).

Additional details and impacted files

@@           Coverage Diff           @@
##             main    #1676   +/-   ##
=======================================
  Coverage   84.63%   84.64%           
=======================================
  Files         125      125           
  Lines       34724    34733    +9     
  Branches    34724    34733    +9     
=======================================
+ Hits        29390    29399    +9     
  Misses       3983     3983           
  Partials     1351     1351

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

rtyler

I think this is generally good hygiene and worth incorporating.

In the test transaction log which I use for validating memory performance I didn't see any noticeable changes in runtime or memory consumption. Still 👍 🚀

fvaleye added 9 commits January 24, 2026 16:42

perf(engine): pre-allocate ordinals Vec in row group filtering

991b454

Micro-benchmark shows 1.5x-4x speedup for the allocation pattern alone (10-1000 rows group). Real-world improvement is smaller since predicate evaluation dominates.

perf(engine): pre-allocate Vec when initializing RowIndexBuilder

6ae50a3

Local benchmark shows 2-4x speedup for RowIndexBuilder initialization.

perf(engine): pre-allocate Vec and HashMap in materialize methods

7906182

Local benchmark: 1.3-1.5x speedup for list/map materialization

perf(engine): pre-allocate seen_ordinals HashSet in RowIndexBuilder::…

942e99c

…build Local benchmark shows 1.8-3x speedup for ordinal deduplication

perf(engine): pre-allocate mask HashSet and getters Vec in visit_rows

01f2be6

Local benchmark: 2-5x speedup for column extraction setup.

perf(engine): pre-allocate output_cols Vec in evaluate_transform_expr…

4d01ed2

…ession Local benchmark: 1.1-1.3x speedup

perf(engine): pre-allocate combined_fields and combined_columns in ap…

9174bde

…pend_columns

perf(transaction): pre-allocate HashSets and Vecs in transaction vali…

c231d38

…dation

perf(log): pre-allocate files Vec in log listing

945eea4

github-actions bot assigned fvaleye Jan 24, 2026

perf(scan): pre-allocate transform_spec Vec in StateInfo

f51d98b

fvaleye force-pushed the perf/pre-allocate-when-size-is-known branch from b2cb564 to f51d98b Compare January 24, 2026 15:50

rtyler approved these changes Jan 25, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: pre-allocate Vecs and HashSets when size is known #1676

perf: pre-allocate Vecs and HashSets when size is known #1676

fvaleye commented Jan 24, 2026

Uh oh!

codecov bot commented Jan 24, 2026

Uh oh!

rtyler left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

perf: pre-allocate Vecs and HashSets when size is known #1676

Are you sure you want to change the base?

perf: pre-allocate Vecs and HashSets when size is known #1676

Conversation

fvaleye commented Jan 24, 2026

What changes are proposed in this pull request?

How was this change tested?

Uh oh!

codecov bot commented Jan 24, 2026

Codecov Report

Uh oh!

rtyler left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants