Skip to content

[ES|QL] Support implicit casting of aggregate_metric_double #129108

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 14 commits into from
Jul 17, 2025

Conversation

limotova
Copy link
Contributor

@limotova limotova commented Jun 7, 2025

This commit adds support for implicit casting of aggregate_metric_double
when present with other numerics for a limited set of aggregation
functions:

  • Max / MaxOverTime
  • Min / MinOverTime
  • Sum / SumOverTime
  • Count / CountOverTime
  • Avg / AvgOverTime

Attempting to use fields mapped to aggregate_metric_double in one index
but some other numeric in another index in any other context will still
require explicit casting with ToAggregateMetricDouble

Known bugs:

  • TS my-data-stream | EVAL m = my-mixed-metric | STATS max(m) does not work
  • Has Been Fixed for implicit casting only--> The following does not work:
TS my-data-stream |
STATS sum(avg_over_time(my-metric)), 
     sum(count_over_time(my-metric)) 
BY keyword_field, time_bucket = bucket(@timestamp, 1 hour)

In other words, avg_over_time plus sum/count_over_time aggregation, both on an aggregate_metric_double field in the same way (both implicitly casted (FIXED), or both explicitly casted with to_aggregate_metric_double (still broken)), grouped by a keyword field

@limotova limotova force-pushed the implicit-casting-agg-metric branch 2 times, most recently from bfb2043 to 707c465 Compare June 9, 2025 18:08
@limotova limotova force-pushed the implicit-casting-agg-metric branch from c4cc58e to da37671 Compare June 23, 2025 13:45
@limotova limotova force-pushed the implicit-casting-agg-metric branch 2 times, most recently from 025e131 to cc2645c Compare June 27, 2025 00:08
This commit adds support for implicit casting of aggregate_metric_double
when present with other numerics for a limited set of aggregation
functions:
- Max / MaxOverTime
- Min / MinOverTime
- Sum / SumOverTime
- Count / CountOverTime
- Avg / AvgOverTime

Attempting to use fields mapped to aggregate_metric_double in one index
but some other numeric in another index in any other context will still
require explicit casting with ToAggregateMetricDouble
@limotova limotova force-pushed the implicit-casting-agg-metric branch from cc2645c to b6b041c Compare June 27, 2025 04:02
)
)
);
// TODO: verify the numbers are accurate
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've manually verified that the values produced make sense, but I'm not sure how to go about verifying within the IT test

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah tricky.. maybe it's easy to track and min of mins and max of maxes, and verify just that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm depending on the query I'm not sure that would work, anything involving sum/_over_time would go out of that range, and then throwing negative numbers into the mix I think would throw everything off quite a bit

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, so only (min|max)(min|max_over_time) would work. One option is to track the min and and max value overall, and verify that they get returned if we have one of those combinations.

var resp = esqlCommand(
"TS "
+ dataStreamName
+ " | STATS min = sum(min_over_time(cpu)), max = sum(max_over_time(cpu)) by cluster, bucket(@timestamp, 1 hour)"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So cool to see this working.. well done!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider randomizing the inner and outer agg.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the current integration test I decided to have it loop through some known combinations for a single aggregation, right now the 2nd bug I listed would cause some combinations (avg_over_time + sum/count_over_time) to fail so I decided to not randomize it.
But it might be good to have some random testing that generates a random number of these aggregations to run in the same query once that bug gets figured out

@limotova limotova requested review from kkrik-es, martijnvg and dnhatn July 1, 2025 00:59

// test _over_time commands with implicit casting of aggregate_metric_double
for (String innerCommand : List.of("min_over_time", "max_over_time", "avg_over_time", "count_over_time")) {
for (String outerCommand : List.of("min", "max", "sum", "count")) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider using randomFrom instead of checking all combinations in each run.

Copy link
Contributor

@kkrik-es kkrik-es left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, deferring to Nhat to approve who understands the planner much better.

Copy link
Member

@dnhatn dnhatn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've left some comments but the approach looks good. Thanks Larisa!

int alreadyAddedUnionFieldAttributes = unionFieldAttributes.size();
plan = plan.transformExpressionsOnly(e -> {
if (e instanceof Avg avg) {
return substituteSurrogates(avg);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we avoid this special rule? Also, now that we have SumOverTime and CountOverTime, can we implement SurrogateExpression for AvgOverTime so it can be treated similarly to Avg?

Copy link
Contributor Author

@limotova limotova Jul 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to substitute surrogates here/before this rule for Avg/AvgOverTime because they get split into 2 new aggregations that both require different convert functions, and I wasn't able to figure out how to get the engine to recognize that it needs to make both of these conversions individually without establishing them this early.

@limotova limotova requested a review from dnhatn July 4, 2025 05:34
Copy link
Member

@dnhatn dnhatn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have some minor comments, but LGTM. Great work, thanks Larisa!

@@ -28,7 +28,7 @@ private TypeConverter(String evaluatorName, ExpressionEvaluator convertEvaluator
this.convertEvaluator = convertEvaluator;
}

public static TypeConverter fromConvertFunction(AbstractConvertFunction convertFunction) {
public static TypeConverter fromScalarFunction(EsqlScalarFunction convertFunction) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we revert this change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we can actually because FromAggregateMetricDouble implements ConvertFunction but it doesn't extend AbstractConvertFunction - and the reason I didn't change FromAggregateMetricDouble to extend it was because it requires 2 inputs but AbstractConvertFunction extends UnaryScalarFunction

@limotova limotova marked this pull request as ready for review July 8, 2025 02:57
@elasticsearchmachine elasticsearchmachine added the needs:triage Requires assignment of a team area label label Jul 8, 2025
@limotova limotova added >non-issue :Analytics/ES|QL AKA ESQL and removed needs:triage Requires assignment of a team area label labels Jul 8, 2025
@elasticsearchmachine elasticsearchmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Jul 8, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-storage-engine (Team:StorageEngine)

Expression fn = (Expression) convert;
List<Expression> children = new ArrayList<>(fn.children());
children.set(0, resolvedAttr);
Expression e = ((Expression) convert).replaceChildren(children);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did we make this change? AbstractConvertFunction contains only a single child, the field. Or do we have something that does not extend that class?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. I see FromAggregateMetricDouble does not extend AbstractConvertFunction.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ToDouble from aggregate_metric_double has two parameters: field and metric.

Copy link
Member

@fang-xing-esql fang-xing-esql left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @limotova!

I’m wondering where I can find some aggregate function tests that take union-typed fields as input, that involves mixed aggregate_metric_double and the other regular numeric typed fields (such as int, long, double, etc.)? I was looking for some examples that show what didn’t work without this PR and what works with it, to help me better understand the change. Having some positive AnalyzerTests or CsvTests will be really helpful for debugging purposes.

aborted.set(Boolean.TRUE);
return aggFunc;
}
if (aggFunc instanceof Avg || aggFunc instanceof AvgOverTime) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any chance this surrogate() could be handled by SubstituteSurrogateAggregations in LogicalPlanOptimizer? As it seems like it could be separated from implicit casting, these two functions are not conversion functions.

Copy link
Contributor Author

@limotova limotova Jul 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is related to the 2nd bug in the PR description, I am hoping to remove it once a better fix is found, but for now unless you substitute surrogates at this stage, a query containing both avg_over_time + sum_over_time or count_over_time will fail

count_over_time becomes sum_over_time(agg_metric.count_metric) when fed an aggregate_metric_double,
And similarly avg_over_time becomes div(sum_over_time(agg_metric), count_over_time(agg_metric)), which then becomes div(sum_over_time(agg_metric.sum_metric), sum_over_time(agg_metric.count_metric))
And at the InsertFieldExtraction phase (specifically here), it drops one of the sum_over_time(agg_metric.count_metric), (which seems correct? Though I don't fully understand the code here)
But then it seems something else later on expects the reference to not be dropped, because the error that appears is a block casting error, where it tries to cast one block as another, presumably because it expected some of the blocks in the page to be shifted over by one

But if this substitution is made sooner, then it's possible to give different NameIds when implicitly casting, and then the references aren't dropped (I understand that this is maybe not ideal though)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll add a comment explaining why it's there..!

Copy link
Member

@fang-xing-esql fang-xing-esql Jul 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the explanation @limotova! If there is a (negative)test(that we can debug into) for this particular case, we can take a closer look together to see if there is anything we can do to make it work.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll reach out on slack!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed the surrogate() here thanks to @fang-xing-esql figuring out that it actually was unnecessary (the key for making the queries work with Avg and AvgOverTime was making unique NameIds from Sum/Count/OverTime)

@limotova
Copy link
Contributor Author

limotova commented Jul 9, 2025

Thank you for taking a look! @fang-xing-esql
Most of the tests for aggregate metric double have been in the yaml tests (46_downsampling and some in 40_tsdb), I'm currently also working on adding some CSV tests that will include aggregate metric doubles, though we don't have any that test it with aggregations outside of the yaml tests right now.
This test in particular runs on 2 indices, one with k8s.pod.network.rx and .tx mapped to aggregate metric double (test-downsample) and one with them mapped as longs (test-2)

I also have a commit in a separate branch that works off of this one with 3 tests related to the 2nd bug; they should all produce the same result in theory, though only the Working one works

@limotova
Copy link
Contributor Author

I'm currently working on adding a dataset with AggregateMetricDouble to the CSV tests and once that is available I'll add in some CSV tests and AnalyzerTests related to this PR and link them back, but for now I'll merge this in how it is

@limotova limotova merged commit 9fa0624 into elastic:main Jul 17, 2025
33 checks passed
@limotova limotova deleted the implicit-casting-agg-metric branch July 17, 2025 18:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/ES|QL AKA ESQL >non-issue :StorageEngine/TSDB You know, for Metrics Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) Team:StorageEngine v9.2.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants