Skip to content

CSHARP-3222: Add LINQ support for median and percentile accumulators/window functions #1743

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 14 commits into
base: main
Choose a base branch
from

Conversation

adelinowona
Copy link
Contributor

This PR introduces the capability to calculate the median and percentile of numeric values in the MongoDB aggregation pipeline for $group and $setWindowFields stages.

@adelinowona adelinowona requested review from rstam and sanych-sun July 29, 2025 11:03
@adelinowona adelinowona requested a review from a team as a code owner July 29, 2025 11:03
Copy link
Contributor

@rstam rstam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initial quick review. Will review more thoroughly after next commit.

var method = expression.Method;
var arguments = expression.Arguments;

if (IsMedianMethod(method))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Normally we do if (method.IsOneOf(__medianMethods)).

Do we want to do it differently here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed that pattern but I just felt that would be too much boilerplate kind of code and there's an easier way to do it. Plus I noticed the StandardDeviationMethodsToAggregationExpressionTranslator follows a similar pattern already.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is some boilerplate in setting up the __medianMethods static field.

There isn't any boilerplate in the if statement. It's roughly the same.

One advantage of using __medianMethods is that it is VERY precise. There is no danger of false hits like there is with the IsMedianMethod approach.

Plus I noticed the StandardDeviationMethodsToAggregationExpressionTranslator follows a similar pattern already.

Yes. That's older code that predates the newer practice of being more precise.

Copy link
Contributor

@rstam rstam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes requested

var method = expression.Method;
var arguments = expression.Arguments;

if (IsMedianMethod(method))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is some boilerplate in setting up the __medianMethods static field.

There isn't any boilerplate in the if statement. It's roughly the same.

One advantage of using __medianMethods is that it is VERY precise. There is no danger of false hits like there is with the IsMedianMethod approach.

Plus I noticed the StandardDeviationMethodsToAggregationExpressionTranslator follows a similar pattern already.

Yes. That's older code that predates the newer practice of being more precise.

@adelinowona adelinowona requested review from rstam and sanych-sun July 29, 2025 23:38

namespace MongoDB.Driver.Linq.Linq3Implementation.Ast.Expressions
{
internal sealed class AstComplexAccumulatorExpression : AstAccumulatorExpression
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is clever but I don't like the idea of a Dictionary<string, AstExpression>. A class that can represent anything in this way is not easy to work with.

I would prefer you just create two new classes AstMedianAccumulatorExpression and AstPercentileAccumulatorExpression.

That would mirror AstMedianExpression and AstPercentileExpression.

It would also avoid the messiness in the VisitComplexAccumulatorExpression method. Instead we could just have two simple methods VisitMedianAccumulatorExpression and VisitPercentileAccumulatorExpression.

The whole point of the Ast classes is to have type-safe representations of MQL, and the use of a Dictionary<string, AstExpression> throws type-safety out the window.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's fair. I'll revert this then

/// </summary>
/// <param name="source">The sequence of values.</param>
/// <returns>The median value.</returns>
public static double Median(this IEnumerable<decimal> source)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still concerned that the return type is double instead of decimal.

Same below.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

discussed on slack. The Server team currently has $median returning doubles regardless of input type but there is a ticket for improving accuracy in the future to return the correct types. So I'll change the return type here to decimal.

/// </summary>
/// <param name="source">The sequence of values.</param>
/// <returns>The median value.</returns>
public static double Median(this IEnumerable<float> source)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Enumerable.Average(IEnumerable<float>) returns float.

Should we also?

/// <param name="source">A sequence of values to calculate the percentiles of.</param>
/// <param name="percentiles">The percentiles to compute (each between 0.0 and 1.0).</param>
/// <returns>The percentiles of the sequence of values.</returns>
public static double[] Percentile(this IEnumerable<decimal> source, IEnumerable<double> percentiles)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we return decimal[]

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the percentiles be decimal also to match?

/// <param name="source">A sequence of values to calculate the percentiles of.</param>
/// <param name="percentiles">The percentiles to compute (each between 0.0 and 1.0).</param>
/// <returns>The percentiles of the sequence of values.</returns>
public static double[] Percentile(this IEnumerable<float> source, IEnumerable<double> percentiles)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we return float[]?

SHould the percentiles web IEnumerable<flost> to match?

/// <param name="selector">The selector that selects a value from the input document.</param>
/// <param name="window">The window boundaries.</param>
/// <returns>The median of the selected values.</returns>
public static double Median<TInput>(this ISetWindowFieldsPartition<TInput> partition, Func<TInput, decimal> selector, SetWindowFieldsWindow window = null)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we return decimal?

@adelinowona adelinowona requested review from rstam and sanych-sun August 4, 2025 21:46
Copy link
Contributor

@rstam rstam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very close to done!

return new AstMedianExpression(input);
}

public static AstMedianAccumulatorExpression MedianAccumulator(AstExpression input)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider a return type of AstAccumulatorExpression.

In most (all?) methods in this class we return the most general class possible. This allows the factory method to sometimes return different types depending on the parameters.

return new AstMedianAccumulatorExpression(input);
}

public static AstMedianWindowExpression MedianWindowExpression(AstExpression input, AstWindow window)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider a return type of AstWindowExpression.

In most (all?) methods in this class we return the most general class possible. This allows the factory method to sometimes return different types depending on the parameters.

@@ -597,6 +597,21 @@ public static AstExpression Max(AstExpression arg1, AstExpression arg2)
return new AstNaryExpression(AstNaryOperator.Max, [arg1, arg2]);
}

public static AstExpression Median(AstExpression input)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider a return type of AstExpression.

In most (all?) methods in this class we return the most general class possible. This allows the factory method to sometimes return different types depending on the parameters.

@@ -653,6 +668,21 @@ public static AstExpression Or(params AstExpression[] args)
return new AstNaryExpression(AstNaryOperator.Or, flattenedArgs);
}

public static AstPercentileExpression Percentile(AstExpression input, AstExpression percentiles)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider a return type of AstExpression.

In most (all?) methods in this class we return the most general class possible. This allows the factory method to sometimes return different types depending on the parameters.

return new AstPercentileExpression(input, percentiles);
}

public static AstPercentileAccumulatorExpression PercentileAccumulator(AstExpression input, AstExpression percentiles)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider a return type of AstAccumulatorExpression.

In most (all?) methods in this class we return the most general class possible. This allows the factory method to sometimes return different types depending on the parameters.

bool TryOptimizeAccumulatorOfElements(out AstExpression optimizedExpression)
private bool IsMappedElementsField(AstExpression expression, out AstExpression rewrittenArg)
{
if (expression is AstMapExpression map && IsElementsField(map.Input))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rename map to mapExpression.

optimizedExpression = null;
return false;
}
private AstExpression CreateGetAccumulatorFieldExpression(AstAccumulatorExpression accumulator)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rename accumulator to accumulatorExpression.

@@ -112,6 +112,26 @@ internal static class EnumerableMethod
private static readonly MethodInfo __maxSingle;
private static readonly MethodInfo __maxSingleWithSelector;
private static readonly MethodInfo __maxWithSelector;
private static readonly MethodInfo __medianDecimal;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All these new fields and properties should be in MongoEnumerableMethod.cs, not here.

{
var selectorLambda = (LambdaExpression)arguments[1];
var selectorParameter = selectorLambda.Parameters[0];
var selectorParameterSerializer = ArraySerializerHelper.GetItemSerializer(sourceTranslation.Serializer);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rename selectorParameterSerializer to sourceItemSerializer.

I think it's better to name the variable for what it IS, rather than where it is used.

Consider reordering lines 80-85 like this:

var sourceItemSerializer = ArraySerializerHelper.GetItemSerializer(sourceTranslation.Serializer);

var selectorLambda = (LambdaExpression)arguments[1];
var selectorParameter = selectorLambda.Parameters[0];
var selectorParameterSymbol = context.CreateSymbol(selectorParameter, sourceItemSerializer);
var selectorContext = context.WithSymbol(selectorParameterSymbol);
var selectorTranslation = ExpressionToAggregationExpressionTranslator.Translate(selectorContext, selectorLambda.Body);

to extract the sourceItemSerializer from the source serializer earlier.


if (method.IsOneOf(__percentileWithSelectorMethods))
{
var selectorLambda = (LambdaExpression)arguments[1];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rename selectorParameterSerializer to sourceItemSerializer.

And reorder lines as suggested above.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants