Skip to content

Release hash aggregation memory on output #25879

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

pettyjamesm
Copy link
Member

@pettyjamesm pettyjamesm commented May 28, 2025

Description

Incrementally releases memory from FlatGroupByHash when HashAggregationOperator starts producing output. Previously, the entire hash table contents were kept in memory until output data was produced from hash aggregations.

Release notes

( ) This is not user-visible or is docs only, and no release notes are required.
( ) Release notes are required. Please propose a release note for me.
(x) Release notes are required, with the following suggested text:

## General
* Reduce memory usage of aggregations by incrementally releasing memory as output rows are are produced ({issue}`25879`)

@pettyjamesm pettyjamesm requested a review from dain May 28, 2025 15:15
@cla-bot cla-bot bot added the cla-signed label May 28, 2025
@pettyjamesm pettyjamesm force-pushed the release-hash-aggregation-memory branch 6 times, most recently from 9c34efb to 8676e2f Compare May 28, 2025 20:54
@pettyjamesm pettyjamesm marked this pull request as ready for review May 28, 2025 23:35
@pettyjamesm pettyjamesm force-pushed the release-hash-aggregation-memory branch 2 times, most recently from a9c503f to 547ef13 Compare May 29, 2025 14:17
@pettyjamesm pettyjamesm requested a review from sopel39 June 4, 2025 18:59
@sopel39 sopel39 requested a review from Copilot June 5, 2025 12:35
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR introduces incremental memory release for hash aggregations by freeing portions of the hash table as output is produced. Key changes include:

  • New test (testReleaseMemoryOnOutput) verifying memory reduction and restricted operations after output release.
  • Implementation of startReleasingOutput in FlatHash and FlatGroupByHash, which updates internal state to release memory.
  • Adjustments in InMemoryHashAggregationBuilder and other GroupByHash implementations to support and integrate the new memory release behavior.

Reviewed Changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
core/trino-main/src/test/java/io/trino/operator/TestGroupByHash.java Added tests for incremental memory release during output production.
core/trino-main/src/test/java/io/trino/operator/CyclingGroupByHash.java Introduced a stub for startReleasingOutput throwing an UnsupportedOperationException.
core/trino-main/src/main/java/io/trino/operator/aggregation/builder/InMemoryHashAggregationBuilder.java Updated aggregation builder to utilize incremental memory release.
core/trino-main/src/main/java/io/trino/operator/FlatHash.java Modified memory handling by nullifying arrays to free memory and added extra checks for the releasing state.
core/trino-main/src/main/java/io/trino/operator/FlatGroupByHash.java Updated to delegate startReleasingOutput to FlatHash.
core/trino-main/src/main/java/io/trino/operator/BigintGroupByHash.java Implemented a no-operation for startReleasingOutput.
core/trino-main/src/main/java/io/trino/operator/AppendOnlyVariableWidthData.java Enhanced chunk-freeing logic with safe null checks.
core/trino-main/src/main/java/io/trino/operator/GroupByHash.java Updated interface to include startReleasingOutput.
Comments suppressed due to low confidence (2)

core/trino-main/src/test/java/io/trino/operator/CyclingGroupByHash.java:58

  • Consider adding a comment explaining that startReleasingOutput is not supported in CyclingGroupByHash and will be implemented later, so that readers understand the rationale behind the UnsupportedOperationException.
public void startReleasingOutput()

core/trino-main/src/main/java/io/trino/operator/GroupByHash.java:112

  • Consider adding Javadoc for the startReleasingOutput method to clarify the expected behavior and usage for implementations, helping future maintainers understand the contract behind memory release.
void startReleasingOutput();

@pettyjamesm pettyjamesm force-pushed the release-hash-aggregation-memory branch from 547ef13 to 50c4a9a Compare June 5, 2025 14:03
@pettyjamesm pettyjamesm force-pushed the release-hash-aggregation-memory branch 2 times, most recently from a074935 to fa66dd7 Compare June 6, 2025 15:02
Incrementally releases memory from FlatGroupByHash when
HashAggregationOperator starts producing output.
@pettyjamesm pettyjamesm force-pushed the release-hash-aggregation-memory branch from cb7aa29 to 18cb110 Compare June 6, 2025 15:25
@raunaqmorarka raunaqmorarka requested a review from lukasz-stec June 9, 2025 13:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

1 participant