-
Notifications
You must be signed in to change notification settings - Fork 1.3k
feat(noisy_avg): Add support for BIGINT noise_scale #13707
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
oliver6782
wants to merge
12
commits into
facebookincubator:main
Choose a base branch
from
oliver6782:export-D76108450
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
✅ Deploy Preview for meta-velox canceled.
|
This pull request was exported from Phabricator. Differential Revision: D76108450 |
This pull request was exported from Phabricator. Differential Revision: D76108450 |
oliver6782
pushed a commit
to oliver6782/velox
that referenced
this pull request
Jun 10, 2025
…r#13707) Summary: Pull Request resolved: facebookincubator#13707 **feat(noisy_avg): Add support for BIGINT noise_scale** This diff adds support for `BIGINT` type noise scale in the `NoisyAvgGaussianAggregate` function. #### Key Changes * **Updated Noise Scale Handling**: Modified the `checkAndSetNoiseScale` function to handle both `DOUBLE` and `BIGINT` types for noise scale. * **Added Type Kind Check**: Added a type kind check to determine whether the noise scale is `DOUBLE` or `BIGINT` and perform the appropriate decoding. * **BIGINT Support**: Decodes `BIGINT` values to `DOUBLE` for noise scale calculations. #### Files Affected * `fbcode/velox/functions/prestosql/aggregates/NoisyAvgGaussianAggregate.cpp` Differential Revision: D76108450
650ad3b
to
9453f2c
Compare
This pull request was exported from Phabricator. Differential Revision: D76108450 |
9453f2c
to
5f7b9c9
Compare
oliver6782
pushed a commit
to oliver6782/velox
that referenced
this pull request
Jun 11, 2025
…r#13707) Summary: Pull Request resolved: facebookincubator#13707 **feat(noisy_avg): Add support for BIGINT noise_scale** This diff adds support for `BIGINT` type noise scale in the `NoisyAvgGaussianAggregate` function. #### Key Changes * **Updated Noise Scale Handling**: Modified the `checkAndSetNoiseScale` function to handle both `DOUBLE` and `BIGINT` types for noise scale. * **Added Type Kind Check**: Added a type kind check to determine whether the noise scale is `DOUBLE` or `BIGINT` and perform the appropriate decoding. * **BIGINT Support**: Decodes `BIGINT` values to `DOUBLE` for noise scale calculations. #### Files Affected * `fbcode/velox/functions/prestosql/aggregates/NoisyAvgGaussianAggregate.cpp` Differential Revision: D76108450
…se_scale) (facebookincubator#13651) Summary: Pull Request resolved: facebookincubator#13651 **Summary** This diff adds more unit tests so that the noisy function is more robust. **Changes** * Added tests that aggregate on non-scalar inputs such as array and map. * Added edge case where the input is empty. Differential Revision: D75916564
…gaussian(col, noise_scale) (facebookincubator#13652) Summary: Pull Request resolved: facebookincubator#13652 Add support for noise_scale of BIGINT type. Differential Revision: D75925753
…andom_seed) (facebookincubator#13653) Summary: Pull Request resolved: facebookincubator#13653 ### Summary This diff introduces random seed to the `noisy_count_gaussian` function to seed the random number generator. ### Changes * Added a new test cases to `NoisyCountGaussianAggregationTest.cpp`. * Updated the `noisy_count_gaussian` function documentation to include the `random_seed` parameter. * Modified the `NoisyCountGaussianAggregate.cpp` file to decode the `random_seed` argument and use it to seed the random number generator. ### Impact This diff enhances the `noisy_count_gaussian` function by providing an optional `random_seed` parameter, allowing users to reproduce the same noisy count results. The new test case ensures that the function works correctly with multiple aggregates and groups. Differential Revision: D75926651
…e functions. (facebookincubator#13661) Summary: Pull Request resolved: facebookincubator#13661 ### Summary This diff refactors duplicate blocks of code to composable functions in the `NoisyCountGaussianAggregate` class. ### Key Changes * Removed duplicate code blocks and replaced them with calls to new composable functions ### Files Changed * `fbcode/velox/functions/prestosql/aggregates/NoisyCountGaussianAggregate.cpp` ### Overall Impact This refactoring improves the maintainability and readability of the code by eliminating duplicates and simplifying the logic. Differential Revision: D76084043
…bookincubator#13701) Summary: Pull Request resolved: facebookincubator#13701 ### This Diff This diff implements a new aggregate function `noisy_sum_gaussian` which calculates the sum over the input values and adds a normally distributed random double value with 0 mean and standard deviation of `noise_scale`. ### File Changes The following files were changed: * `RegisterAggregateFunctions.cpp` - registered the new aggregate function * `NoisySumAccumulator` - new file to support aggregation. * `NoisySumGaussianAggregate` - new file for the aggregate function implementation. * `NoisySumGaussianAggregationTest` - new file to carry out simple unit test. * `FacebookPrestoExpressionFuzzerTest.cpp` - added the new function to the list of non-deterministic functions to skip * `aggregate.rst` - added documentation for the new function * `FacebookAggregationFuzzerTest.cpp` - added the new function to the list of non-deterministic functions to skip ### Impact This new function adds a way to calculate a noisy sum of values, which can be useful in statistical analysis. Differential Revision: D75928375
…ian(col, noise_scale) Summary: ## This Diff This diff adds support for all numeric types in the `noisy_sum_gaussian` function. ### Key changes **Testing** New test cases have been added to cover the additional numeric types. **Implementation** The implementation of the `noisy_sum_gaussian` function has been updated to handle the new numeric types. This includes dispatching to the corresponding type for the update sum operation. **Impact** This diff extends the functionality of the `noisy_sum_gaussian` function to support more numeric types, making it more versatile and useful in a wider range of applications. Differential Revision: D75964171
…m_seed)) Summary: ## This Diff #### Overview This diff updates the `noisy_sum_gaussian` aggregate function to include an optional `random_seed` parameter. The function now takes three arguments: `col`, `noise_scale`, and `random_seed`. The `random_seed` parameter allows users to specify a seed for the random number generator, enabling reproducibility. #### Code Changes The following files have been modified: * `fbcode/velox/functions/prestosql/aggregates/NoisySumGaussianAggregate.cpp`: Updated the function to accept the `random_seed` parameter and added logic to handle it. * `fbcode/velox/functions/prestosql/aggregates/tests/NoisySumGaussianAggregationTest.cpp`: Added test cases to cover the new functionality. * `fbcode/velox/docs/functions/presto/aggregate.rst`: Updated the documentation to reflect the changes to the function signature. * `fbcode/velox/functions/lib/aggregates/noisy_aggregation/NoisySumAccumulator.h`: Modified the `NoisySumAccumulator` class to accept the `random_seed` parameter. #### Impact This update enables users to specify a random seed for the `noisy_sum_gaussian` aggregate function, making it possible to reproduce the results. This is particularly useful for testing and debugging purposes. The updated function remains backward compatible, and existing use cases will not be affected. Differential Revision: D75976973
…, upper, random_seed) Summary: **Added Noisy Sum Gaussian Aggregation with Bounds** =========================================================== This diff implements the noisy sum gaussian aggregation with bounds, which allows users to specify a lower and upper bound for the aggregated value. **Changes** ----------- * Added `lowerBound` and `upperBound` parameters to the `NoisySumAccumulator` constructor. * Updated the `NoisySumGaussianAggregation` function to clip the aggregated value to the specified bounds. * Added test cases to verify the correctness of the new functionality. * Updated the documentation to reflect the changes. **Example Use Case** -------------------- The new aggregation function can be used as follows: ```sql SELECT noisy_sum_gaussian(value, noise_scale, lower_bound, upper_bound, random_seed) FROM table_name; ``` This will aggregate the `value` column with the specified `noise_scale` and clip the result to the range `[lower_bound, upper_bound]`. The `random_seed` parameter can be used to seed the random number generator. If not provided, a secure random seed will be used. Differential Revision: D75982248
…, lower, upper, random_seed) Summary: ### Diff Summary #### Test: Add Unit Test for `noisy_sum_gaussian` Function This diff adds a unit test for the `noisy_sum_gaussian` function, specifically to test its behavior with no noise. The test is designed to randomly generate input vectors, noise scales, bounds, and random seeds, and then compare the results of the aggregation function with DuckDB. #### Affected Files * `fbcode/velox/functions/prestosql/aggregates/tests/NoisySumGaussianAggregationTest.cpp` #### Notable Changes * Added a new test `fuzzerTestNoNoise` to `NoisySumGaussianAggregationTest.cpp` to test the `noisy_sum_gaussian` function with no noise. * The test uses randomly generated input vectors, noise scales, bounds, and random seeds to test the function's behavior. * The results of the aggregation function are compared with DuckDB to ensure accuracy. Differential Revision: D75988485
…functions (facebookincubator#13662) Summary: Pull Request resolved: facebookincubator#13662 #### Refactor Duplicate Code in Noisy Sum Gaussian Aggregate The diff refactors the `NoisySumGaussianAggregate.cpp` file to reduce code duplication by introducing composable functions. The changes extract the logic for updating noise scale and accumulator checks into separate, reusable functions. **Key Changes:** * Removed 54 lines of duplicate code and replaced with a 9-line composable function call * Introduced `checkAndSetNoiseScale` and `updateAccumulator` functions to encapsulate logic * Improved code readability and maintainability #### Files Changed: * `fbcode/velox/functions/prestosql/aggregates/NoisySumGaussianAggregate.cpp` #### Purpose: The refactoring aims to simplify the codebase, reduce duplication, and enhance the overall quality of the `NoisySumGaussianAggregate` implementation. Differential Revision: D76087461
…bookincubator#13706) Summary: Pull Request resolved: facebookincubator#13706 This diff implements a new aggregate function, `noisy_avg_gaussian`, which calculates the average of a column and adds Gaussian noise to the result. The function takes two arguments: `col` (the column to aggregate) and `noise_scale` (the standard deviation of the noise). The implementation includes: * A new C++ class, `NoisyAvgGaussianAggregate`, which implements the aggregate function. * A new header file, `NoisyAvgGaussianAggregate.h`, which declares the `registerNoisyAvgGaussianAggregate` function. * Modifications to the `FacebookWindowFuzzerTest` and `FacebookPrestoExpressionFuzzerTest` to include the new function in the list of non-deterministic functions. * A new section in the Presto aggregate functions documentation, `aggregate.rst`, which describes the `noisy_avg_gaussian` function. ### Example Use Case ```sql SELECT noisy_avg_gaussian(x, 1.0) AS noisy_avg FROM table; ``` This query calculates the average of column `x` and adds Gaussian noise with a standard deviation of 1.0 to the result. Differential Revision: D76107397
…r#13707) Summary: Pull Request resolved: facebookincubator#13707 **feat(noisy_avg): Add support for BIGINT noise_scale** This diff adds support for `BIGINT` type noise scale in the `NoisyAvgGaussianAggregate` function. #### Key Changes * **Updated Noise Scale Handling**: Modified the `checkAndSetNoiseScale` function to handle both `DOUBLE` and `BIGINT` types for noise scale. * **Added Type Kind Check**: Added a type kind check to determine whether the noise scale is `DOUBLE` or `BIGINT` and perform the appropriate decoding. * **BIGINT Support**: Decodes `BIGINT` values to `DOUBLE` for noise scale calculations. #### Files Affected * `fbcode/velox/functions/prestosql/aggregates/NoisyAvgGaussianAggregate.cpp` Differential Revision: D76108450
This pull request was exported from Phabricator. Differential Revision: D76108450 |
5f7b9c9
to
942da57
Compare
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
CLA Signed
This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
fb-exported
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary:
feat(noisy_avg): Add support for BIGINT noise_scale
This diff adds support for
BIGINT
type noise scale in theNoisyAvgGaussianAggregate
function.Key Changes
checkAndSetNoiseScale
function to handle bothDOUBLE
andBIGINT
types for noise scale.DOUBLE
orBIGINT
and perform the appropriate decoding.BIGINT
values toDOUBLE
for noise scale calculations.Files Affected
fbcode/velox/functions/prestosql/aggregates/NoisyAvgGaussianAggregate.cpp
Differential Revision: D76108450