Closed
Description
At present, when a node-agent pod restarts, all DU/DD in Accepted phase are cancelled.
We may be able to enhance this as we have recorded the accepted node in DUCR/DDCR as of #8498. Details:
-
If the node-agent pod restarts because node-agent itself, only the DU/DD that are accepted by the restarted node-agent pod need to be cancelled
-
If the node-agent pod restarts because node restart, some other DU/DD in Accepted phase may have created some backupPod/restorePod into the restarted node, then once the node restarts, we need to investigate what happens to those pods:
- If they fail and never recover, we need to cancel those DU/DD - If they fail and then recover, we need to ignore and not cancel the DU/DD