Running `argocd sync app --dry-run` commands affect metrics #21899

jsolana · 2025-02-18T15:08:48Z

Checklist:

I've searched in the docs and FAQ for my answer: https://bit.ly/argocd-faq.
I've included steps to reproduce the bug.
I've pasted the output of argocd version.

Describe the bug

Related to #21661

Currently, executing the command argocd app sync --dry-run affects both the application’s state and the internal metrics exposed by ArgoCD (eg: argocd_app_info).

The main issue is that if there are alerts based on these metrics, and the dry-run execution identifies an error (e.g., a change that violates a Kyverno policy or an invalid CRD schema), the application state changes to SyncErr. This also updates the metrics, which can potentially trigger alerts based on these metrics.

For example:

# alert manager alert definition
- alert: ArgoCDApplicationUnknown
  expr: sum by (cluster) (argocd_app_info{sync_status="Unknown"}) > 0
  for: 15m
...

Since the definition of a dry-run is to execute requests without persistence, I wonder if it makes sense to handle it in a way that ensures no changes are made.

Alternatively, adding a dry_run label to differentiate operational requests from dry-run requests could also be an option (a similar change has been proposed for kyverno link).

To Reproduce

Run argocd app sync --dry-run command.

Expected behavior

There are different proposals:

Add a dryrun label to distinguish dryrun activity from real ones.
Ignore dryrun executions (not affecting metrics).

Initially I fond of the 1 because dryrun has a cost associated in terms of resources consumption / performance. Ignoring activity related to dryrun make extremely hard to identify the reason of performance issues.

Version

It is affecting whatever version because currently dryrun executions are not distinguished.

The text was updated successfully, but these errors were encountered:

harshitSkyscanner · 2025-02-25T14:54:14Z

I am trying to get more exposure to argocd codebase and develop some understanding. It looks like an easy fix if it is just to add a label.
I can help contribute to it.

jsolana · 2025-02-25T19:37:38Z

Thanks mate, I ve just create this MR to fix it. Let me know wdyt

agaudreault · 2025-02-25T20:15:11Z

Have you investigated why the sync status changes for dry runs? This seems more like it should be fixed by not modifying the persisted sync status on dry runs rather than adding it to the metrics.

harshitSkyscanner · 2025-02-25T20:20:41Z

Thanks mate, I ve just create this MR to fix it. Let me know wdyt

Ahh, I just saw this. I created the PR too. I did more or less the same thing but also updated some other metrics as well. I wasn't sure about my approach so I didn't added tests just yet.

#22014

harshitSkyscanner · 2025-02-25T20:21:40Z

Have you investigated why the sync status changes for dry runs? This seems more like it should be fixed by not modifying the persisted sync status on dry runs rather than adding it to the metrics.

@agaudreault - That is something that can be investigated in parallel but should not affect this PR. Metrics should have an indicator if they were generated via dry run or a normal operation.

jsolana · 2025-02-25T22:10:23Z

Thanks mate, I ve just create this MR to fix it. Let me know wdyt

Ahh, I just saw this. I created the PR too. I did more or less the same thing but also updated some other metrics as well. I wasn't sure about my approach so I didn't added tests just yet.

#22014

Taking in count this comment not sure if this is the way to go.

Have you investigated why the sync status changes for dry runs? This seems more like it should be fixed by not modifying the persisted sync status on dry runs rather than adding it to the metrics.

The idea is to split this issue in two different ones:

This one related to metrics
Another one related to change and persist the app status running a dryrun op

Wdyt?

Want to know more about how repo server persists app status, any idea where to look?
Thanks!

harshitSkyscanner · 2025-02-27T18:38:49Z

Want to know more about how repo server persists app status, any idea where to look?
Thanks!

I have been looking into this today. Shouldn't this be somewhere in controller/sync.go?
So maybe a fix like this - #22064

I am happy to add more stuff to it if I am even remotely on the right track here.

Also, just to clarify what happens here is during dry run sync, it is that updates the application sync state incorrectly right?
(Any logs here would be helpful as well to see the flow of code)

crenshaw-dev · 2025-02-28T17:07:33Z

I agree that it's weird to update the sync state for a dry run... but if we don't do that, does the controller have a way to communicate the success/failure of the dry run up to the UI/CLI? If not, maybe instead of not persisting the state, we just need to persist it to a different field so that we preserve both the real sync state and the dry run sync state?

harshitSkyscanner · 2025-02-28T17:16:47Z

I agree that it's weird to update the sync state for a dry run... but if we don't do that, does the controller have a way to communicate the success/failure of the dry run up to the UI/CLI? If not, maybe instead of not persisting the state, we just need to persist it to a different field so that we preserve both the real sync state and the dry run sync state?

I was just trying to reproduce this in my local with argo-cd version 2.14.2.

The full behaviour that I can see is -
SYNC STATUS - remains unchanged
LAST SYNC - Changes to whatever the result of dry run sync was

I kind of agree with @crenshaw-dev here. Looking at the code, if we don't update anything, we won't be able to return the result back correctly. So we need to handle the results of dry run sync in a different manner

crenshaw-dev · 2025-02-28T18:01:18Z

I think @jsolana's PR probably solves the metrics problem. But yeah, a more general solution would probably involve CRD, controller, API, CLI, and UI work.

harshitSkyscanner · 2025-02-28T18:39:12Z

@crenshaw-dev - Any idea on where to start?
Currently at this - #22064

crenshaw-dev · 2025-02-28T19:16:30Z

I think we need to start by enumerating the user experiences that would be impacted by not updating the sync status field, whether those experiences are via the UI, CLI, or API. Then we need to figure out how to rebuild those using a new dry run sync status field.

jsolana · 2025-03-04T09:39:27Z

I'm wondering if this is more of a UI feedback issue—where "Last Sync" isn't clearly communicated—rather than a fundamental change in how Sync and DryRun are handled internally.

Perhaps it would be useful if the UI included information on whether the "Last Sync" operation was a DryRun or not.

Related with #22059

I think @jsolana's PR probably solves the metrics problem

For us this is the "most urgent" issue cause impact developers experience with false positives.
@crenshaw-dev The PR is completed, in case you have time and could take a look. It would super nice 🙏 🙇

crenshaw-dev · 2025-03-04T20:30:25Z

@jsolana agreed, adding clarifying information about whether it was a dry run might be a sufficient fix with relatively light CR/code changes.

jsolana · 2025-03-19T09:11:27Z

Thanks to @agaudreault , I now have a much clearer idea of the path to follow (in theory 😅 ):

Persist DryRun in the Status of the operation (which corresponds to the State of the Last Operation).
Update the metrics to take the data from Status.DryRun (and not from Operation, which reflects the current operation).
Improve UI to reflect Last Sync was a dryrun

Make sense?

jsolana · 2025-03-28T12:00:11Z

Hi!

Do you think SyncOperationResult is the proper place to extend and persist the DryRun value? (code)

// SyncOperationResult represent result of sync operation
type SyncOperationResult struct {
	// Resources contains a list of sync result items for each individual resource in a sync operation
	Resources ResourceResults `json:"resources,omitempty" protobuf:"bytes,1,opt,name=resources"`
	// Revision holds the revision this sync operation was performed to
	Revision string `json:"revision" protobuf:"bytes,2,opt,name=revision"`
	// Source records the application source information of the sync, used for comparing auto-sync
	Source ApplicationSource `json:"source,omitempty" protobuf:"bytes,3,opt,name=source"`
	// Source records the application source information of the sync, used for comparing auto-sync
	Sources ApplicationSources `json:"sources,omitempty" protobuf:"bytes,4,opt,name=sources"`
	// Revisions holds the revision this sync operation was performed for respective indexed source in sources field
	Revisions []string `json:"revisions,omitempty" protobuf:"bytes,5,opt,name=revisions"`
	// ManagedNamespaceMetadata contains the current sync state of managed namespace metadata
	ManagedNamespaceMetadata *ManagedNamespaceMetadata `json:"managedNamespaceMetadata,omitempty" protobuf:"bytes,6,opt,name=managedNamespaceMetadata"`
	// If true, the sync will be performed as a dry-run  without actually applying any changes.
	DryRun bool `json:"dryRun,omitempty" protobuf:"bytes,3,opt,name=dryRun"`
}

Thanks!

crenshaw-dev · 2025-03-28T14:02:10Z

My recommendation would be to persist any dry run result in a completely different field (e.g. status.dryRunOperationResult). This would allow the UI/CLI to present the result of both the most recent actual operation and the most recent dry run at the same time.

agaudreault · 2025-03-28T16:02:25Z

Currently, the SyncResult is persisted in the OperationState. Knowing that there can only be 1 operation in progress at the same time, there should only be 1 OperationState. The problem is that the status.operationState represent the result of the ongoing operation, but also the lastOperation when completed.

The way I see to solve this is to have 1 new fields status.lastCompletedNonDryRunOperation (for sure another name).

When an operation is triggered. It updates the status.operationState.
When it completes, if it is NOT a dry run, it copies status.operationState to status.lastCompletedNonDryRunOperation.

Most code stays unaffected because the behavior of status.operationState stays the same. However, it is easy to make a method to return the last effective operation:

getLastNonDryRunOperationResult() {
  if app.status.operationState.operation.DryRun() {
    return app.status.lastCompletedNonDryRunOperation
  }
 // Complete or ongoing non-dry run operation
 return app.status.operationState.operation
}

We could also persist the last dry-run operation in the status, but I see little value to do that.

crenshaw-dev · 2025-03-28T16:17:35Z

I think the prohibition of concurrent dry-run and non-dry-run operations is a matter of implementation, not API. I could definitely imagine breaking the dry-run operations into their own work queue in the future.

The UI/CLI currently just pull the latest operation state. Having to toggle between lastCompletedNonDryRunOperation and operationState complicates the client code. Seems like it would be nicer to just fetch the state they want: dry run or non-dry run.

jsolana · 2025-03-31T15:22:22Z

Hi @agaudreault @crenshaw-dev , In case you can take a look to this PR to confirm I understand properly the proposal

Thanks

jsolana added the bug Something isn't working label Feb 18, 2025

jsolana mentioned this issue Feb 25, 2025

chore: add "dry_run" label to sync metrics #22010

Merged

14 tasks

This was referenced Feb 25, 2025

chore: dry run label for metrics to fix issue 21899 #22013

Closed

chore: add dry_run label to metrics #22014

Closed

harshitSkyscanner mentioned this issue Feb 27, 2025

fix: sync result getting updated incorrectly for dry run #22064

Open

14 tasks

jsolana mentioned this issue Mar 31, 2025

chore: LastCompletedNonDryRunOperation support #22530

Open

14 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Running `argocd sync app --dry-run` commands affect metrics #21899

Running `argocd sync app --dry-run` commands affect metrics #21899

jsolana commented Feb 18, 2025

harshitSkyscanner commented Feb 25, 2025 •

edited

Loading

jsolana commented Feb 25, 2025

agaudreault commented Feb 25, 2025

harshitSkyscanner commented Feb 25, 2025 •

edited

Loading

harshitSkyscanner commented Feb 25, 2025

jsolana commented Feb 25, 2025 •

edited

Loading

harshitSkyscanner commented Feb 27, 2025 •

edited

Loading

crenshaw-dev commented Feb 28, 2025

harshitSkyscanner commented Feb 28, 2025

crenshaw-dev commented Feb 28, 2025 •

edited

Loading

harshitSkyscanner commented Feb 28, 2025

crenshaw-dev commented Feb 28, 2025

jsolana commented Mar 4, 2025 •

edited

Loading

crenshaw-dev commented Mar 4, 2025 •

edited

Loading

jsolana commented Mar 19, 2025

jsolana commented Mar 28, 2025

crenshaw-dev commented Mar 28, 2025

agaudreault commented Mar 28, 2025

crenshaw-dev commented Mar 28, 2025

jsolana commented Mar 31, 2025 •

edited

Loading

Running argocd sync app --dry-run commands affect metrics #21899

Running argocd sync app --dry-run commands affect metrics #21899

Comments

jsolana commented Feb 18, 2025

harshitSkyscanner commented Feb 25, 2025 • edited Loading

jsolana commented Feb 25, 2025

agaudreault commented Feb 25, 2025

harshitSkyscanner commented Feb 25, 2025 • edited Loading

harshitSkyscanner commented Feb 25, 2025

jsolana commented Feb 25, 2025 • edited Loading

harshitSkyscanner commented Feb 27, 2025 • edited Loading

crenshaw-dev commented Feb 28, 2025

harshitSkyscanner commented Feb 28, 2025

crenshaw-dev commented Feb 28, 2025 • edited Loading

harshitSkyscanner commented Feb 28, 2025

crenshaw-dev commented Feb 28, 2025

jsolana commented Mar 4, 2025 • edited Loading

crenshaw-dev commented Mar 4, 2025 • edited Loading

jsolana commented Mar 19, 2025

jsolana commented Mar 28, 2025

crenshaw-dev commented Mar 28, 2025

agaudreault commented Mar 28, 2025

crenshaw-dev commented Mar 28, 2025

jsolana commented Mar 31, 2025 • edited Loading

Running `argocd sync app --dry-run` commands affect metrics #21899

Running `argocd sync app --dry-run` commands affect metrics #21899

harshitSkyscanner commented Feb 25, 2025 •

edited

Loading

harshitSkyscanner commented Feb 25, 2025 •

edited

Loading

jsolana commented Feb 25, 2025 •

edited

Loading

harshitSkyscanner commented Feb 27, 2025 •

edited

Loading

crenshaw-dev commented Feb 28, 2025 •

edited

Loading

jsolana commented Mar 4, 2025 •

edited

Loading

crenshaw-dev commented Mar 4, 2025 •

edited

Loading

jsolana commented Mar 31, 2025 •

edited

Loading