Skip to content

feat(noisy_avg): Add support for all numeric input types #13709

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 14 commits into
base: main
Choose a base branch
from

Conversation

oliver6782
Copy link

Summary:

Summary

This diff adds support for all numeric input types to the noisy_avg aggregation function.

Code Changes

The diff modifies two files:

  1. NoisyAvgGaussianAggregationTest.cpp: Adds new test cases for bigint, decimal, and real input types.
  2. NoisyAvgGaussianAggregate.cpp: Updates the update method to handle different input types using VELOX_DYNAMIC_SCALAR_TYPE_DISPATCH.

Impact

This diff allows the noisy_avg aggregation function to work with a wider range of input types, making it more versatile and useful for various use cases.

Differential Revision: D76209005

Copy link

netlify bot commented Jun 10, 2025

Deploy Preview for meta-velox canceled.

Name Link
🔨 Latest commit 0749a4c
🔍 Latest deploy log https://app.netlify.com/projects/meta-velox/deploys/68491970c4474a000871c21d

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jun 10, 2025
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D76209005

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D76209005

oliver6782 pushed a commit to oliver6782/velox that referenced this pull request Jun 10, 2025
…ubator#13709)

Summary:
Pull Request resolved: facebookincubator#13709

### Summary

This diff adds support for all numeric input types to the `noisy_avg` aggregation function.

### Code Changes

The diff modifies two files:

1. `NoisyAvgGaussianAggregationTest.cpp`: Adds new test cases for `bigint`, `decimal`, and `real` input types.
2. `NoisyAvgGaussianAggregate.cpp`: Updates the `update` method to handle different input types using `VELOX_DYNAMIC_SCALAR_TYPE_DISPATCH`.

### Impact

This diff allows the `noisy_avg` aggregation function to work with a wider range of input types, making it more versatile and useful for various use cases.

Differential Revision: D76209005
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D76209005

oliver6782 pushed a commit to oliver6782/velox that referenced this pull request Jun 10, 2025
…ubator#13709)

Summary:
Pull Request resolved: facebookincubator#13709

### Summary

This diff adds support for all numeric input types to the `noisy_avg` aggregation function.

### Code Changes

The diff modifies two files:

1. `NoisyAvgGaussianAggregationTest.cpp`: Adds new test cases for `bigint`, `decimal`, and `real` input types.
2. `NoisyAvgGaussianAggregate.cpp`: Updates the `update` method to handle different input types using `VELOX_DYNAMIC_SCALAR_TYPE_DISPATCH`.

### Impact

This diff allows the `noisy_avg` aggregation function to work with a wider range of input types, making it more versatile and useful for various use cases.

Differential Revision: D76209005
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D76209005

oliver6782 pushed a commit to oliver6782/velox that referenced this pull request Jun 11, 2025
…ubator#13709)

Summary:
Pull Request resolved: facebookincubator#13709

### Summary

This diff adds support for all numeric input types to the `noisy_avg` aggregation function.

### Code Changes

The diff modifies two files:

1. `NoisyAvgGaussianAggregationTest.cpp`: Adds new test cases for `bigint`, `decimal`, and `real` input types.
2. `NoisyAvgGaussianAggregate.cpp`: Updates the `update` method to handle different input types using `VELOX_DYNAMIC_SCALAR_TYPE_DISPATCH`.

### Impact

This diff allows the `noisy_avg` aggregation function to work with a wider range of input types, making it more versatile and useful for various use cases.

Differential Revision: D76209005
oliverxu added 14 commits June 10, 2025 22:45
…se_scale) (facebookincubator#13651)

Summary:
Pull Request resolved: facebookincubator#13651

**Summary**
This diff adds more unit tests so that the noisy function is more robust.

**Changes**
* Added tests that aggregate on non-scalar inputs such as array and map.
* Added edge case where the input is empty.

Differential Revision: D75916564
…gaussian(col, noise_scale) (facebookincubator#13652)

Summary:
Pull Request resolved: facebookincubator#13652

Add support for noise_scale of BIGINT type.

Differential Revision: D75925753
…andom_seed) (facebookincubator#13653)

Summary:
Pull Request resolved: facebookincubator#13653

### Summary
This diff introduces random seed to the `noisy_count_gaussian` function  to seed the random number generator.

### Changes
* Added a new test cases to `NoisyCountGaussianAggregationTest.cpp`.
* Updated the `noisy_count_gaussian` function documentation to include the `random_seed` parameter.
* Modified the `NoisyCountGaussianAggregate.cpp` file to decode the `random_seed` argument and use it to seed the random number generator.

### Impact
This diff enhances the `noisy_count_gaussian` function by providing an optional `random_seed` parameter, allowing users to reproduce the same noisy count results. The new test case ensures that the function works correctly with multiple aggregates and groups.

Differential Revision: D75926651
…e functions. (facebookincubator#13661)

Summary:
Pull Request resolved: facebookincubator#13661

### Summary

This diff refactors duplicate blocks of code to composable functions in the `NoisyCountGaussianAggregate` class.

### Key Changes

* Removed duplicate code blocks and replaced them with calls to new composable functions

### Files Changed

* `fbcode/velox/functions/prestosql/aggregates/NoisyCountGaussianAggregate.cpp`

### Overall Impact

This refactoring improves the maintainability and readability of the code by eliminating duplicates and simplifying the logic.

Differential Revision: D76084043
…bookincubator#13701)

Summary:
Pull Request resolved: facebookincubator#13701

### This Diff

This diff implements a new aggregate function `noisy_sum_gaussian` which calculates the sum over the input values and adds a normally distributed random double value with 0 mean and standard deviation of `noise_scale`.

### File Changes

The following files were changed:

* `RegisterAggregateFunctions.cpp` - registered the new aggregate function
* `NoisySumAccumulator` - new file to support aggregation.
* `NoisySumGaussianAggregate` - new file for the aggregate function implementation.
* `NoisySumGaussianAggregationTest` - new file to carry out simple unit test.
* `FacebookPrestoExpressionFuzzerTest.cpp` - added the new function to the list of non-deterministic functions to skip
* `aggregate.rst` - added documentation for the new function
* `FacebookAggregationFuzzerTest.cpp` - added the new function to the list of non-deterministic functions to skip

### Impact

This new function adds a way to calculate a noisy sum of values, which can be useful in statistical analysis.

Differential Revision: D75928375
…ian(col, noise_scale)

Summary:
## This Diff

This diff adds support for all numeric types in the `noisy_sum_gaussian` function.

### Key changes

**Testing**
New test cases have been added to cover the additional numeric types.

**Implementation**
The implementation of the `noisy_sum_gaussian` function has been updated to handle the new numeric types. This includes dispatching to the corresponding type for the update sum operation.

**Impact**
This diff extends the functionality of the `noisy_sum_gaussian` function to support more numeric types, making it more versatile and useful in a wider range of applications.

Differential Revision: D75964171
…m_seed))

Summary:
## This Diff

#### Overview

This diff updates the `noisy_sum_gaussian` aggregate function to include an optional `random_seed` parameter. The function now takes three arguments: `col`, `noise_scale`, and `random_seed`. The `random_seed` parameter allows users to specify a seed for the random number generator, enabling reproducibility.

#### Code Changes

The following files have been modified:

* `fbcode/velox/functions/prestosql/aggregates/NoisySumGaussianAggregate.cpp`: Updated the function to accept the `random_seed` parameter and added logic to handle it.
* `fbcode/velox/functions/prestosql/aggregates/tests/NoisySumGaussianAggregationTest.cpp`: Added test cases to cover the new functionality.
* `fbcode/velox/docs/functions/presto/aggregate.rst`: Updated the documentation to reflect the changes to the function signature.
* `fbcode/velox/functions/lib/aggregates/noisy_aggregation/NoisySumAccumulator.h`: Modified the `NoisySumAccumulator` class to accept the `random_seed` parameter.

#### Impact

This update enables users to specify a random seed for the `noisy_sum_gaussian` aggregate function, making it possible to reproduce the results. This is particularly useful for testing and debugging purposes. The updated function remains backward compatible, and existing use cases will not be affected.

Differential Revision: D75976973
…, upper, random_seed)

Summary:
**Added Noisy Sum Gaussian Aggregation with Bounds**
===========================================================

This diff implements the noisy sum gaussian aggregation with bounds, which allows users to specify a lower and upper bound for the aggregated value.

**Changes**
-----------

*   Added `lowerBound` and `upperBound` parameters to the `NoisySumAccumulator` constructor.
*   Updated the `NoisySumGaussianAggregation` function to clip the aggregated value to the specified bounds.
*   Added test cases to verify the correctness of the new functionality.
*   Updated the documentation to reflect the changes.

**Example Use Case**
--------------------

The new aggregation function can be used as follows:

```sql
SELECT noisy_sum_gaussian(value, noise_scale, lower_bound, upper_bound, random_seed)
FROM table_name;
```

This will aggregate the `value` column with the specified `noise_scale` and clip the result to the range `[lower_bound, upper_bound]`. The `random_seed` parameter can be used to seed the random number generator. If not provided, a secure random seed will be used.

Differential Revision: D75982248
…, lower, upper, random_seed)

Summary:
### Diff Summary

#### Test: Add Unit Test for `noisy_sum_gaussian` Function

This diff adds a unit test for the `noisy_sum_gaussian` function, specifically to test its behavior with no noise. The test is designed to randomly generate input vectors, noise scales, bounds, and random seeds, and then compare the results of the aggregation function with DuckDB.

#### Affected Files

* `fbcode/velox/functions/prestosql/aggregates/tests/NoisySumGaussianAggregationTest.cpp`

#### Notable Changes

* Added a new test `fuzzerTestNoNoise` to `NoisySumGaussianAggregationTest.cpp` to test the `noisy_sum_gaussian` function with no noise.
* The test uses randomly generated input vectors, noise scales, bounds, and random seeds to test the function's behavior.
* The results of the aggregation function are compared with DuckDB to ensure accuracy.

Differential Revision: D75988485
…functions (facebookincubator#13662)

Summary:
Pull Request resolved: facebookincubator#13662

#### Refactor Duplicate Code in Noisy Sum Gaussian Aggregate

The diff refactors the `NoisySumGaussianAggregate.cpp` file to reduce code duplication by introducing composable functions. The changes extract the logic for updating noise scale and accumulator checks into separate, reusable functions.

**Key Changes:**

* Removed 54 lines of duplicate code and replaced with a 9-line composable function call
* Introduced `checkAndSetNoiseScale` and `updateAccumulator` functions to encapsulate logic
* Improved code readability and maintainability

#### Files Changed:

* `fbcode/velox/functions/prestosql/aggregates/NoisySumGaussianAggregate.cpp`

#### Purpose:
The refactoring aims to simplify the codebase, reduce duplication, and enhance the overall quality of the `NoisySumGaussianAggregate` implementation.

Differential Revision: D76087461
…bookincubator#13706)

Summary:
Pull Request resolved: facebookincubator#13706

This diff implements a new aggregate function, `noisy_avg_gaussian`, which calculates the average of a column and adds Gaussian noise to the result. The function takes two arguments: `col` (the column to aggregate) and `noise_scale` (the standard deviation of the noise).

The implementation includes:

* A new C++ class, `NoisyAvgGaussianAggregate`, which implements the aggregate function.
* A new header file, `NoisyAvgGaussianAggregate.h`, which declares the `registerNoisyAvgGaussianAggregate` function.
* Modifications to the `FacebookWindowFuzzerTest` and `FacebookPrestoExpressionFuzzerTest` to include the new function in the list of non-deterministic functions.
* A new section in the Presto aggregate functions documentation, `aggregate.rst`, which describes the `noisy_avg_gaussian` function.

### Example Use Case

```sql
SELECT noisy_avg_gaussian(x, 1.0) AS noisy_avg FROM table;
```

This query calculates the average of column `x` and adds Gaussian noise with a standard deviation of 1.0 to the result.

Differential Revision: D76107397
…ubator#13709)

Summary:
Pull Request resolved: facebookincubator#13709

### Summary

This diff adds support for all numeric input types to the `noisy_avg` aggregation function.

### Code Changes

The diff modifies two files:

1. `NoisyAvgGaussianAggregationTest.cpp`: Adds new test cases for `bigint`, `decimal`, and `real` input types.
2. `NoisyAvgGaussianAggregate.cpp`: Updates the `update` method to handle different input types using `VELOX_DYNAMIC_SCALAR_TYPE_DISPATCH`.

### Impact

This diff allows the `noisy_avg` aggregation function to work with a wider range of input types, making it more versatile and useful for various use cases.

Differential Revision: D76209005
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D76209005

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. fb-exported
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants