Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

StalePersistentVolumeClaim duplicate time series in Victoria Metrics #105

Open
dhess opened this issue Nov 25, 2023 · 3 comments · May be fixed by #124
Open

StalePersistentVolumeClaim duplicate time series in Victoria Metrics #105

dhess opened this issue Nov 25, 2023 · 3 comments · May be fixed by #124
Labels
help wanted Extra attention is needed

Comments

@dhess
Copy link

dhess commented Nov 25, 2023

Hi,

The StalePersistentVolumeClaim query here breaks Victoria Metrics in some configurations:

"expr": "kube_persistentvolumeclaim_info unless (kube_persistentvolumeclaim_info * on(persistentvolumeclaim) group_left kube_pod_spec_volumes_persistentvolumeclaims_info) == 1",

422: error when executing query="kube_persistentvolumeclaim_info unless (kube_persistentvolumeclaim_info * on(persistentvolumeclaim) group_right kube_pod_spec_volumes_persistentvolumeclaims_info) == 1" on the time range (start=1700931915000, end=1700932215000, step=15000): cannot execute "kube_persistentvolumeclaim_info unless ((kube_persistentvolumeclaim_info * on(persistentvolumeclaim) group_right() kube_pod_spec_volumes_persistentvolumeclaims_info) == 1)": cannot execute "(kube_persistentvolumeclaim_info{persistentvolumeclaim=~\"audit-vault-0|audit-vault-1|audit-vault-2|... duplicate time series on the left side of `* on(persistentvolumeclaim) group_right()`: ...

In our particular case, this happens on any ReadWriteMany PVC that occurs in more than one pod in the same namespace.

@avishnu
Copy link
Member

avishnu commented Sep 13, 2024

Request community to help here.

@avishnu avishnu added the help wanted Extra attention is needed label Sep 13, 2024
@pschichtel
Copy link

pschichtel commented Dec 9, 2024

@avishnu I think I'm seeing the exact same issue when using renovate with a persistent cache. I think the problem is, that renovate's CronJob produces several pods each referencing the same PVC, which leads to a many-to-many situation that prometheus doesn't support.

I've changed to rule to this:

kube_persistentvolumeclaim_info{namespace!="renovate"} unless (kube_persistentvolumeclaim_info * on (persistentvolumeclaim) group_left () (max by (persistentvolumeclaim) (kube_pod_spec_volumes_persistentvolumeclaims_info))) == 1

The main change is, that I replaced group_left () kube_pod_spec_volumes_persistentvolumeclaims_info) with group_left () (max by (persistentvolumeclaim) (kube_pod_spec_volumes_persistentvolumeclaims_info))). This max by (persistentvolumeclaim) collapses all series into one per PVC. Not sure if that's a good way to do it, but it does work.

@dhess can you confirm that you have multiple pods referencing the same PVC? (count by (persistentvolumeclaim) (kube_pod_spec_volumes_persistentvolumeclaims_info) > 1)

This also breaks when using RWX volumes with scaled-up applications.

@pschichtel
Copy link

I created #124 with the change I suggested in my previous comment

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed
Projects
None yet
3 participants