Skip to content

Commit 857a7f8

Browse files
committed
Setup doc
Signed-off-by: Joseph <jvaikath@redhat.com>
1 parent c759147 commit 857a7f8

File tree

1 file changed

+40
-62
lines changed

1 file changed

+40
-62
lines changed

design/backup_cancellation.md

Lines changed: 40 additions & 62 deletions
Original file line numberDiff line numberDiff line change
@@ -2,34 +2,49 @@
22
# Backup Cancellation Design
33

44
## Abstract
5-
This proposal introduces user-initiated backup cancellation functionality to Velero, allowing users to abort running backups through a new `cancel` field in the backup specification.
6-
The design addresses GitHub issues [#9189](https://github.com/vmware-tanzu/velero/issues/9189
7-
) and [#2098](https://github.com/vmware-tanzu/velero/issues/2098) by providing a mechanism to cleanly cancel async operations and prevent resource leaks when backups need to be terminated.
5+
- This proposal introduces user-initiated backup cancellation functionality to Velero, allowing users to abort running backups through a new `cancel` field in the backup specification
6+
> backup.Spec.Cancel
7+
8+
- The design addresses GitHub issues [#9189](https://github.com/vmware-tanzu/velero/issues/9189
9+
) and [#2098](https://github.com/vmware-tanzu/velero/issues/2098)
10+
- It is currently not possible to delete an in-progress backup: the deletion controller blocks it
11+
- Cancellation flow would allow this to happen
812

913
## Background
10-
Currently, Velero lacks the ability to cancel running backups, leading to several critical issues.
11-
When users accidentally submit broad backup jobs (e.g., forgot to narrow resource selectors), the system becomes blocked and scheduled jobs accumulate.
12-
Additionally, the backup deletion controller prevents running backups from being deleted.
14+
- Currently, Velero lacks the ability to cancel running backups, leading to several critical issues
15+
16+
- When users accidentally submit broad backup jobs (e.g., forgot to narrow resource selectors), the system becomes blocked and scheduled jobs accumulate
17+
18+
- Additionally, the backup deletion controller prevents running backups from being deleted
1319

1420

1521
## Goals
1622
- Enable users to cancel running backups through a `cancel` field in the backup specification
17-
- Cleanly cancel all associated async operations (BackupItemAction operations, DataUploads, PodVolumeBackups)
23+
24+
- Cleanly cancel all associated async operations (BackupItemAction operations, DataUploads)
25+
26+
- Delete backup data on
27+
- object storage,
28+
- csi native snapshots,
29+
- backup tarball etc
30+
31+
while keeping backup logs and backup associated data for inspection
32+
1833
- Provide clear backup phase transitions (InProgress → Cancelling → Cancelled)
1934

2035
## Non Goals
2136
- Cancelling backups that have already completed or failed
22-
- Rolling back partially completed backup operations
37+
2338
- Implementing cancellation for restore operations (future work)
2439

2540

2641
## High-Level Design
27-
The solution introduces a new `cancel` boolean field to the backup specification that users can set to `true` to request cancellation.
28-
Existing controllers (backup_controller, backup_operations_controller, backup_finalizer_controller) will check for this field and transition the backup to a `Cancelling` phase before returning early from their reconcile loops.
42+
- The solution introduces a new `cancel` boolean field to the backup specification that users can set to `true` to request cancellation
43+
44+
- Existing controllers `(backup_controller, backup_operations_controller, backup_finalizer_controller)` will check for this field, attempt to cancel async ops and then transition to the `Cancelling` phase
45+
46+
- A new dedicated backup cancellation controller will watch for backups in the `Cancelling` phase, trying to cleanup backup data
2947

30-
A new dedicated backup cancellation controller will watch for backups in the `Cancelling` phase and coordinate the actual cancellation work.
31-
This controller will call `Cancel()` methods on all in-progress BackupItemAction operations (which automatically handles DataUpload cancellation), directly cancel PodVolumeBackups by setting their cancel flags, and finally transition the backup to `Cancelled` phase.
32-
The design uses a 5-second ticker to prevent API overload and ensures clean separation between cancellation detection and execution.
3348

3449
## Detailed Design
3550

@@ -59,40 +74,21 @@ const (
5974
### Controller Changes
6075

6176
#### Existing Controllers
62-
Modify `backup_controller.go`, `backup_operations_controller.go`, and `backup_finalizer_controller.go` to check for cancellation:
63-
```go
64-
// Early in each Reconcile method
65-
if backup.Spec.Cancel != nil && *backup.Spec.Cancel {
66-
if backup.Status.Phase != BackupPhaseCancelling && backup.Status.Phase != BackupPhaseCancelled {
67-
backup.Status.Phase = BackupPhaseCancelling
68-
// Update backup and return
69-
return ctrl.Result{}, c.Client.Patch(ctx, backup, client.MergeFrom(original))
70-
}
71-
return ctrl.Result{}, nil // Skip processing for cancelling/cancelled backups
72-
}
73-
```
74-
In addition, the `backup_operations_controller.go` will have a periodic check around backup progress updates, rather than running every time progress is updated to reduce API load.
77+
`backup_controller`
78+
79+
`backup_operations_controller`
80+
81+
`backup_finalizer_controller`
82+
7583

7684
#### New Backup Cancellation Controller
77-
Create `backup_cancellation_controller.go`:
78-
```go
79-
type backupCancellationReconciler struct {
80-
client.Client
81-
logger logrus.FieldLogger
82-
itemOperationsMap *itemoperationmap.BackupItemOperationsMap
83-
newPluginManager func(logger logrus.FieldLogger) clientmgmt.Manager
84-
backupStoreGetter persistence.ObjectBackupStoreGetter
85-
}
85+
8686
```
8787
8888
The controller will:
8989
1. Watch for backups in `BackupPhaseCancelling`
90-
2. Get operations from `itemOperationsMap.GetOperationsForBackup()`
91-
3. Call `bia.Cancel(operationID, backup)` on all in-progress BackupItemAction operations
92-
4. Find and cancel PodVolumeBackups by setting `pvb.Spec.Cancel = true`
93-
5. Wait for all cancellations to complete
94-
6. Set backup phase to `BackupPhaseCancelled`
95-
7. Update backup metadata in object storage
90+
2. Attempt to delete backup data
91+
3. Set phase to `BackupPhaseCancelled`
9692
9793
### Cancellation Flow
9894
@@ -104,10 +100,9 @@ For operations with BackupItemAction v2 implementations (e.g., CSI PVC actions):
104100
4. Operation marked as `OperationPhaseCanceled`
105101
106102
#### PodVolumeBackup Operations
107-
For PodVolumeBackups (which lack BackupItemAction implementations):
108-
1. Controller directly finds PVBs by backup UID label
109-
2. Sets `pvb.Spec.Cancel = true` on in-progress PVBs
110-
3. Node-agent PodVolumeBackup controller handles actual cancellation
103+
BackupWithResolvers is atomic
104+
If cancellation happens before the call, nothing happens, ItemBlocks or PodVolumeBackups
105+
If after, PostHooks ensure that PodVolumeBackups are completed, so there is no cancellation here
111106
112107
113108
## Alternatives Considered
@@ -130,23 +125,6 @@ The new `cancel` field is optional and defaults to nil/false, ensuring backward
130125
Existing backups will continue to work without modification.
131126
The new backup phases (`Cancelling`, `Cancelled`) are additive and don't affect existing phase transitions.
132127
133-
## Implementation
134-
Implementation will be done incrementally in the following phases:
135-
136-
**Phase 1**: API changes and basic cancellation detection
137-
- Add `cancel` field to BackupSpec
138-
- Add new backup phases
139-
- Update existing controllers to detect cancellation and transition to `Cancelling` phase
140-
141-
**Phase 2**: Cancellation controller implementation
142-
- Implement backup cancellation controller
143-
- Add BackupItemAction operation cancellation
144-
- Add PodVolumeBackup direct cancellation
145-
146-
**Phase 3**: Testing and refinement
147-
- Comprehensive end-to-end testing
148-
- Testing if slowdowns occur due to the frequency of checking `backup.Cancel` spec field
149-
- Documentation and user guide updates
150128
151129
**Future Work**:
152130

0 commit comments

Comments
 (0)