-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
After a restore, cStorVolume remains in Init, CVRs remain in Error state #127
Comments
@sgielen Once restore completes, if incremental backups are there then those backups are also needs to be restored to retrieve all the data, targetip needs to be updated in replica.
In your case, command will be
In your case, it will be
|
as mentioned on the Slack, I'm working on the fix for this issue |
I made good progress on this. Just testing restores now. I will create PR soon for this project as well as cstor-operators and I am also thinking about backporting the fix to maya repo as well. |
@zlymeda Let me know when you have a PR, and I will try to retest my restore with it then :) |
For completeness & future reference, this was indeed the problem; after I set the targetip correctly according to instructions in this thread, the CVRs became Healthy, the Pods started automatically and the restored application was completely functional exactly as I had originally expected. As soon as a PR exists for this issue, I will update my version of the velero-plugin and retry the restore steps without setting target IP to confirm it is no longer necessary to do this manually :-) |
Single backup / restore is working. I need to make a small change to support restore from scheduled backup with multiple backups created. @sgielen the fix will be in 3 repos. One here, one in maya and one in cstor-operators. If you want to test it now, you can use my images I've created for testing:
|
@zlymeda, unfortunately I have an arm64 based cluster. But either way I prefer to build my own images unless they come from the upstream. Would you already have a patch to share somewhere that I could test? Thanks! |
@sgielen I mentioned this issue in PRs, please check |
What steps did you take and what happened:
I wanted to test a full restore of a namespace (a single application). I tested this on a namespace called "vikunja", which contains two Deployments, two Services, two Pods, one Ingress and one PVC, backed by a cStorVolume and three cStorVolumeReplicas that are all healthy. The restore succeeded, however, in the new namespace the application never came up because its cStorVolume remains in
Init
state and its three cStorVolumeReplicas remain inError
state.This is the Restore command I ran:
Where
volumes-full
is a (non-Schedule) backup that contains a full copy of most of the volumes in my cluster.The restore finished successfully in approximately one minute. Indeed, after this, the
vikunja-restore
namespace contains two Deployments, two Services, two Pods, one Ingress and one PVC. I modified the Ingress so that the endpoint URI is different from the original namespace, and visited the application at the new URI expecting to find the same application, but in the state from the backup instead of the most recent state. However, the application does not load, because the PVC cannot be mounted as the iSCSI target is not ready.I tried to let the situation resolve overnight but to no avail.
What did you expect to happen:
I expected the volume and its replicas to eventually become Healthy with the same contents as the volume had during the restored backup.
The output of the following commands will help us better understand what's going on:
velero restore logs vikunja-test-restore | grep -v 'Skipping namespace'
: https://pastebin.com/vh9ACfAnvelero restore describe vikunja-test-restore
:Anything else you would like to add:
[Miscellaneous information that will assist in solving the issue.]
Environment:
velero version
): v1.5.1velero client config get features
):kubectl version
): client 1.19.2, server 1.18.3/etc/os-release
): k3osThe text was updated successfully, but these errors were encountered: