Skip to content

Conversation

@fvaleye
Copy link
Collaborator

@fvaleye fvaleye commented Jan 24, 2026

What changes are proposed in this pull request?

This PR improves memory allocation efficiency by pre-allocating Vec and HashSet collections when the size is known or can be reasonably estimated. This avoids repeated reallocations as collections grow.

How was this change tested?

  • Existing test suite passes
  • Changes are allocation-only optimizations with no behavioral changes

Micro-benchmark shows 1.5x-4x speedup for the allocation pattern alone (10-1000 rows group). Real-world improvement is smaller since predicate evaluation dominates.
Local benchmark shows 2-4x speedup for RowIndexBuilder initialization.
Local benchmark: 1.3-1.5x speedup for list/map materialization
…build

Local benchmark shows 1.8-3x speedup for ordinal deduplication
Local benchmark: 2-5x speedup for column extraction setup.
@fvaleye fvaleye force-pushed the perf/pre-allocate-when-size-is-known branch from b2cb564 to f51d98b Compare January 24, 2026 15:50
@codecov
Copy link

codecov bot commented Jan 24, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 84.64%. Comparing base (bbaef1a) to head (f51d98b).

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #1676   +/-   ##
=======================================
  Coverage   84.63%   84.64%           
=======================================
  Files         125      125           
  Lines       34724    34733    +9     
  Branches    34724    34733    +9     
=======================================
+ Hits        29390    29399    +9     
  Misses       3983     3983           
  Partials     1351     1351           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Member

@rtyler rtyler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is generally good hygiene and worth incorporating.

In the test transaction log which I use for validating memory performance I didn't see any noticeable changes in runtime or memory consumption. Still 👍 🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants