-
Notifications
You must be signed in to change notification settings - Fork 304
Add provisioner-aware VolumeSnapshotClass selection and RWO access mode for DataImportCron #3991
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Add provisioner-aware VolumeSnapshotClass selection and RWO access mode for DataImportCron #3991
Conversation
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
00f6e20 to
19bd66f
Compare
akalenyu
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apologies for not being present in the technical discussion, really appreciate the effort on this! and ofc I understand that I am outnumbered here.
It just feels wrong to me to have all sorts of backwards (essentially) APIs in CDI to work around this
| // DataImportCronAccessModesByProvisionerKey defines required access modes for DataImportCron PVCs | ||
| // Some provisioners require specific access modes for DataImportCron-created PVCs | ||
| var DataImportCronAccessModesByProvisionerKey = map[string][]v1.PersistentVolumeAccessMode{ | ||
| "pd.csi.storage.gke.io": {rwo}, | ||
| "pd.csi.storage.gke.io/hyperdisk": {rwo}, | ||
| } | ||
|
|
||
| // DataImportCronSnapshotClassParametersByProvisionerKey defines required VolumeSnapshotClass parameters for DataImportCron. | ||
| // Some provisioners require specific parameters in the VolumeSnapshotClass for DataImportCron snapshots. | ||
| var DataImportCronSnapshotClassParametersByProvisionerKey = map[string]map[string]string{ | ||
| // https://docs.cloud.google.com/kubernetes-engine/docs/how-to/persistent-volumes/backup-pd-volume-snapshots#restore-snapshot | ||
| "pd.csi.storage.gke.io": { | ||
| "snapshot-type": "images", | ||
| }, | ||
| "pd.csi.storage.gke.io/hyperdisk": { | ||
| "snapshot-type": "images", | ||
| }, | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To me, the fact that this degree of handling is required in CDI screams
that for that provider, there is a need for a separate storage class for cron purposes.
That provider would then represent itself internally as
"pd.csi.storage.gke.io/hyperdisk-crons": {{rwo, block}},somewhat similar to
| if sc.Parameters["migratable"] == "true" { |
And we would be using the "correct" volumesnapshotclass via storageProfile.Status.SnapshotClass
(the special storage class would hint us which one to choose via snapclass parameter or something)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Alex, I completely agree with that approach.. tbh it would eliminate the need for most of these changes.
Trying to avoid adding an extra SC by forcing two VSCs to work with a single SC is too problematic.
we need to see how to reflect that need to create that dedicated SC for golden images.
regardless, working on your suggested changes asap :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we need to see how to reflect that need to create that dedicated SC for golden images.
yeah that's the tough bit. I am not against adding convenience APIs under HCO for example if that helps them
(hco.spec field instead of going over each cron and adding a storage class name)
|
/cc @awels @arnongilboa @Acedus |
|
If we absolutely MUST go this path then I think we should start discussing an API extension for StorageProfiles; |
+1, CDI should remain as oblivious as possible to quirks of the storage providers it has to work with, this is no exception. |
19bd66f to
d406174
Compare
5c9a8c0 to
432b4e8
Compare
432b4e8 to
dff9273
Compare
| } | ||
|
|
||
| func (r *DataImportCronReconciler) newSourceDataVolume(cron *cdiv1.DataImportCron, dataVolumeName string) *cdiv1.DataVolume { | ||
| func (r *DataImportCronReconciler) newSourceDataVolume(cron *cdiv1.DataImportCron, dataVolumeName string, desiredStorageClass *storagev1.StorageClass, storageProfile *cdiv1.StorageProfile) *cdiv1.DataVolume { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do you need desiredStorageClass here?
| if storageProfile.Status.SnapshotClass != nil { | ||
| return storageProfile.Status.SnapshotClass, nil | ||
| } | ||
| className, err := cc.GetSnapshotClassForSmartClone(pvc, &desiredStorageClass.Name, nil, r.log, r.client, r.recorder) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You have desiredStorageClass.Name in storageProfile, so maybe drop desiredStorageClass param?
| "pd.csi.storage.gke.io": { | ||
| "snapshot-type": "images", | ||
| }, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we want to cover it for non-hyperdisk?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
according to Snapshot frequency limits doc Snapshot restore limitation (6 per hour) is for all types, not just hyperdisk.
and this Limitations doc suggest that its not possible to snapshot RWX pvc if its an image snapshot
| } | ||
|
|
||
| // SnapshotClassParametersForDataImportCronByProvisionerKey defines required VolumeSnapshotClass parameters for DataImportCron snapshots | ||
| var SnapshotClassParametersForDataImportCronByProvisionerKey = map[string]map[string]string{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe generalize it like other maps here with map[string]func(sc *storagev1.StorageClass) bool, so it can check not only parameters?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is possible, i want to make sure I undersood what you mean.
you want to make the comparison and return bool?
for that,I also need the VSC to compare, something like:
func(sc *storagev1.StorageClass, vsc *snapshotv1.VolumeSnapshotClass) bool?
… annotations
Add support for provisioner-specific requirements when creating snapshots
and PVCs for DataImportCron. Some provisioners have specific needs:
- GKE Persistent Disk requires snapshot-type: images parameter in VSC
- GKE Persistent Disk and Rook Ceph RBD require RWO access mode for DataImportCron PVCs
Change details:
- Add StorageProfile annotations for DataImportCron configuration:
cdi.kubevirt.io/useReadWriteOnceForDataImportCron: Signals RWO access mode
cdi.kubevirt.io/snapshotClassForDataImportCron: Specifies VSC name
- Centralize provisioner-specific configuration in storagecapabilities:
UseReadWriteOnceForDataImportCronByProvisionerKey: Maps provisioners requiring RWO
SnapshotClassParametersForDataImportCronByProvisionerKey: Maps provisioners to VSC parameters
- StorageProfile controller automatically reconciles annotations based on provisioner
- DataImportCron controller applies RWO from StorageProfile when DV doesn't specify access modes
- DataImportCron controller selects VSC with priority: StorageProfile annotation > StorageProfile status > standard selection
- Unit tests for both controllers
- Update documentation with annotation details
Signed-off-by: Noam Assouline <[email protected]>
dff9273 to
bfd3fff
Compare
|
unrelated failure cluster |
|
@noamasu: The following test failed, say
DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
akalenyu
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good!
| if storageProfile.Status.SnapshotClass != nil { | ||
| return storageProfile.Status.SnapshotClass, nil | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could pass storageProfile.Status.SnapshotClass to GetSnapshotClassForSmartClone as we've been doing before, or am I missing something?
|
|
||
| // Apply RWO access mode as default for DataImportCron (from StorageProfile annotation) | ||
| // Only applies if the DV doesn't already have AccessModes configured | ||
| if storageProfile != nil && storageProfile.Annotations != nil && storageProfile.Annotations[cc.AnnUseReadWriteOnceForDataImportCron] == "true" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reading from a nil map is safe (storageProfile.Annotations != nil)
| if err := r.client.Get(ctx, types.NamespacedName{Name: desiredStorageClass.Name}, storageProfile); err != nil { | ||
| storageProfile = nil | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should probably just requeue on err, doesn't make sense for there to not be a profile
| } | ||
|
|
||
| func (r *DataImportCronReconciler) createImportDataVolume(ctx context.Context, dataImportCron *cdiv1.DataImportCron) error { | ||
| func (r *DataImportCronReconciler) createImportDataVolume(ctx context.Context, dataImportCron *cdiv1.DataImportCron, desiredStorageClass *storagev1.StorageClass) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
picked a random line to ask this on. do you care about existing installs or just new ones? i.e. do you want to seamlessly handle someone upgrading with said type of storage and getting all dvs/snaps converted?
it's fine if not, but if you do, that would require some special care
| if storageProfile.Annotations != nil { | ||
| if vscName := storageProfile.Annotations[cc.AnnSnapshotClassForDataImportCron]; vscName != "" { | ||
| return &vscName, nil | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
probably want to unit test this, especially the ann taking precedence over storageProfile.Status.SnapshotClass
|
/test pull-containerized-data-importer-e2e-nfs |
Add support for provisioner-specific requirements when creating snapshots and PVCs for DataImportCron. Some provisioners have specific needs:
What this PR does / why we need it:
Standard GCP snapshots (using pd.csi.storage.gke.io) are limited to 6 restores per hour per snapshot. Using a VolumeSnapshotClass with
snapshot-type: imagesenables unlimited restores, but these snapshots cannot be created from ReadWriteMany (RWX) PVCs. More details can be found hereThis PR adds provisioner-aware DataImportCron configuration via StorageProfile annotations:
StorageProfile Controller: Automatically detects provisioner requirements and sets:
cdi.kubevirt.io/useReadWriteOnceForDataImportCron: Signals RWO access mode for DataImportCron PVCs when not explicitly configuredcdi.kubevirt.io/snapshotClassForDataImportCron: Specifies the VolumeSnapshotClass name (auto-discovers matching VSC with required parameters)DataImportCron Controller:
This ensures snapshot creation succeeds without restore rate limits for GKE, while still allowing the final volume to be restored with the desired access mode. The approach is extensible - new provisioner requirements can be added by updating the storage capabilities configuration.
Which issue(s) this PR fixes:
Jira: https://issues.redhat.com/browse/CNV-73302
Release note: