Skip to content

Add optimizer for extracting projection labels #550

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

fpetkovski
Copy link
Collaborator

This commit adds a logical optimizer for extracting projection labels from a query.

This commit adds a logical optimizer for extracting projection labels
from a query.

Signed-off-by: Filip Petkovski <[email protected]>
projection = Projection{
Labels: slices.Clone(e.Grouping),
Include: !e.Without,
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May I know how your scanner is implemented when it is a without? Does it return series hash label in this case?

https://github.com/thanos-io/promql-engine/pull/549/files#diff-2d378c97867270a9cf6883930a4a229b3928d637e4c5c558abf5f4bd913f2f8cR41

This is what I implemented as a mock storage which does projection and always return series hash when projection is available. So maybe there is some difference here.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When it is without, it returns the hash over all labels, and the labels which are not in the Labels slice.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you need to include series hash in the grouping label for without? So that it can be removed after the aggregation.

Or you have another place to remove series hash label?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll need to take a look at the details, will post a bit later.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we dont need to include because the hash will be just an implementation detail of the storage. We use it for things like: partitioning, map series to iterators, etc.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We include the hash label only when Include is true. We also have an explicit select for the series hash label for some binary joins as shown in tests.

Copy link
Contributor

@yeya24 yeya24 Apr 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we don't have hash label when without then how do we pass the duplicate labels check in the engine? Do we have to disable it.

I am trying to run some correctness tests and saw some issues based on my setup. I assume something wrong in my mock storage implementation. Maybe you can try to add a similar correctness test using a mock storage and see if there is any correctness issue.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We disabled those duplicate checks, and afaik prometheus has a feature to do that now as well.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. But I can still see some correctness issues after disabling duplicate check so maybe you can take a closer look

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's disable them here too - with a hash column we won't get issues though since we never drop it i think

slices.Sort(projection.Labels)
projection.Labels = slices.Compact(projection.Labels)
e.Projection = projection
projection = Projection{}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you also need to change MergeSelect or just modify the scanner to get it work with Projection? Like the selects you merged might have different projection labels and you need to fetch both and project only required labels for each select

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm this is a good question, we do not use MergeSelect anymore, but I think we should modify it to be a union of all selectors that it merges.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants