Skip to content

Fix correctness issue in predict_linear with step invariant #527

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

harry671003
Copy link
Contributor

@harry671003 harry671003 commented Mar 17, 2025

Issue

Our continuous correctness tests found an issue with predict_linear with step invariant matrix selector.

Eg: predict_linear({__name__="http_requests_total", pod!~"nginx-1"}[5m] @ start(), -0.37690610678629094)

This PR addresses the problem by allowing the matrixScanner to act in an invariant way similar to Prometheus engine.
See: https://github.com/prometheus/prometheus/blob/2a5ed8b8a55fecaa79236ef4adb9f0b82b34587c/promql/engine.go#L1788

Signed-off-by: 🌲 Harry 🌊 John 🏔 <[email protected]>
Signed-off-by: 🌲 Harry 🌊 John 🏔 <[email protected]>
@harry671003 harry671003 changed the title Fix failure in predict_linear Fix correctness issue in predict_linear with step invariant Mar 18, 2025
Signed-off-by: 🌲 Harry 🌊 John 🏔 <[email protected]>
@harry671003 harry671003 force-pushed the predict_linear_failure branch from 2ec4376 to a43f047 Compare March 19, 2025 00:15
@harry671003 harry671003 marked this pull request as ready for review March 19, 2025 00:34
query: `predict_linear({__name__="http_requests_total",pod!~"nginx-1"}[5m] @ start(), -0.37690610678629094)`,
end: time.Unix(600, 0),
start: time.Unix(300, 0),
},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this an issue for predict linear only or it can impact any function that takes matrix selector with step invariant?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will only impact functions that are at modifier unsafe that takes matrix selector as arg.

https://github.com/prometheus/prometheus/blob/308c8c48c15c74a929c430447df3d3c3a3d4001f/promql/functions.go#L1914

So far only applicable to predict_linear.

@MichaHoffmann
Copy link
Contributor

Can we just wrap the matrix function into step invariant operator? https://github.com/thanos-io/promql-engine/blob/main/execution/step_invariant/step_invariant.go#L47 ?

@MichaHoffmann
Copy link
Contributor

We could just push step inviarnace up in a preprocessor like here: https://github.com/thanos-io/promql-engine/blob/main/logicalplan/plan.go#L335

@harry671003
Copy link
Contributor Author

harry671003 commented Mar 19, 2025

We could just push step inviarnace up in a preprocessor like here

Can we just wrap the matrix function into step invariant operator?

The PromQL parser parses the query into:
Screenshot 2025-03-19 at 10 51 22 AM

After calling promql.PreprocessExpr() in plan.go this becomes:
Screenshot 2025-03-19 at 10 52 11 AM

In Prometheus, predict_linear is marked as at modifier unsafe. So it cannot be wrapped with StepInvariance:
https://github.com/prometheus/prometheus/blob/308c8c48c15c74a929c430447df3d3c3a3d4001f/promql/functions.go#L1914

@MichaHoffmann
Copy link
Contributor

I see, I wonder if we maybe should jsut fall back for now for correctness sake - this seems to be fairly niche usecase that we must weigh against added complexity

@harry671003
Copy link
Contributor Author

I see, I wonder if we maybe should jsut fall back for now for correctness sake - this seems to be fairly niche usecase that we must weigh against added complexity

I'm okay with doing the fallback. There is one concern, we'll have to exclude the functions.test file from acceptance tests.
If that is okay, I can create a PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants