Skip to content

Conversation

@noamasu
Copy link
Collaborator

@noamasu noamasu commented Dec 23, 2025

Add support for provisioner-specific requirements when creating snapshots and PVCs for DataImportCron. Some provisioners have specific needs:

  • GKE Persistent Disk requires snapshot-type: images parameter in VSC for DataImportCron snapshots
  • GKE Persistent Disk requires RWO access mode for DataImportCron PVCs

What this PR does / why we need it:

Standard GCP snapshots (using pd.csi.storage.gke.io) are limited to 6 restores per hour per snapshot. Using a VolumeSnapshotClass with snapshot-type: images enables unlimited restores, but these snapshots cannot be created from ReadWriteMany (RWX) PVCs. More details can be found here

This PR adds provisioner-aware DataImportCron configuration via StorageProfile annotations:

  • StorageProfile Controller: Automatically detects provisioner requirements and sets:

    • cdi.kubevirt.io/useReadWriteOnceForDataImportCron: Signals RWO access mode for DataImportCron PVCs when not explicitly configured
    • cdi.kubevirt.io/snapshotClassForDataImportCron: Specifies the VolumeSnapshotClass name (auto-discovers matching VSC with required parameters)
  • DataImportCron Controller:

    • Applies RWO access mode to DataVolume PVCs when annotation is set and no access modes are configured (preserves existing configurations)
    • Uses the specified VolumeSnapshotClass from annotation when creating snapshots

This ensures snapshot creation succeeds without restore rate limits for GKE, while still allowing the final volume to be restored with the desired access mode. The approach is extensible - new provisioner requirements can be added by updating the storage capabilities configuration.

Which issue(s) this PR fixes:

Jira: https://issues.redhat.com/browse/CNV-73302

Release note:

Add provisioner-aware VolumeSnapshotClass selection and RWO access mode for DataImportCron

@kubevirt-bot kubevirt-bot added release-note Denotes a PR that will be considered when it comes time to generate release notes. dco-signoff: yes Indicates the PR's author has DCO signed all their commits. labels Dec 23, 2025
@kubevirt-bot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign aglitke for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@kubevirt-bot kubevirt-bot added needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. size/L labels Dec 23, 2025
@noamasu noamasu force-pushed the provisioner-aware-vsc-selection-for-dic branch from 00f6e20 to 19bd66f Compare December 23, 2025 13:30
@noamasu
Copy link
Collaborator Author

noamasu commented Dec 23, 2025

/cc @arnongilboa @akalenyu

@noamasu noamasu changed the title Add provisioner-aware VolumeSnapshotClass selection for DataImportCron WIP: Add provisioner-aware VolumeSnapshotClass selection for DataImportCron Dec 23, 2025
@kubevirt-bot kubevirt-bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Dec 23, 2025
Copy link
Collaborator

@akalenyu akalenyu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies for not being present in the technical discussion, really appreciate the effort on this! and ofc I understand that I am outnumbered here.

It just feels wrong to me to have all sorts of backwards (essentially) APIs in CDI to work around this

Comment on lines 210 to 227
// DataImportCronAccessModesByProvisionerKey defines required access modes for DataImportCron PVCs
// Some provisioners require specific access modes for DataImportCron-created PVCs
var DataImportCronAccessModesByProvisionerKey = map[string][]v1.PersistentVolumeAccessMode{
"pd.csi.storage.gke.io": {rwo},
"pd.csi.storage.gke.io/hyperdisk": {rwo},
}

// DataImportCronSnapshotClassParametersByProvisionerKey defines required VolumeSnapshotClass parameters for DataImportCron.
// Some provisioners require specific parameters in the VolumeSnapshotClass for DataImportCron snapshots.
var DataImportCronSnapshotClassParametersByProvisionerKey = map[string]map[string]string{
// https://docs.cloud.google.com/kubernetes-engine/docs/how-to/persistent-volumes/backup-pd-volume-snapshots#restore-snapshot
"pd.csi.storage.gke.io": {
"snapshot-type": "images",
},
"pd.csi.storage.gke.io/hyperdisk": {
"snapshot-type": "images",
},
}
Copy link
Collaborator

@akalenyu akalenyu Dec 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To me, the fact that this degree of handling is required in CDI screams
that for that provider, there is a need for a separate storage class for cron purposes.

That provider would then represent itself internally as

"pd.csi.storage.gke.io/hyperdisk-crons": {{rwo, block}},

somewhat similar to

if sc.Parameters["migratable"] == "true" {
but I think this can remain vague like "rwo" since that is really the only capability of their golden image storage class

And we would be using the "correct" volumesnapshotclass via storageProfile.Status.SnapshotClass
(the special storage class would hint us which one to choose via snapclass parameter or something)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Alex, I completely agree with that approach.. tbh it would eliminate the need for most of these changes.
Trying to avoid adding an extra SC by forcing two VSCs to work with a single SC is too problematic.

we need to see how to reflect that need to create that dedicated SC for golden images.

regardless, working on your suggested changes asap :)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need to see how to reflect that need to create that dedicated SC for golden images.

yeah that's the tough bit. I am not against adding convenience APIs under HCO for example if that helps them
(hco.spec field instead of going over each cron and adding a storage class name)

@akalenyu
Copy link
Collaborator

/cc @awels @arnongilboa @Acedus
I am totally ready to get challenged on this 🙏

@kubevirt-bot kubevirt-bot requested review from Acedus and awels December 23, 2025 13:56
@akalenyu
Copy link
Collaborator

If we absolutely MUST go this path then I think we should start discussing an API extension for StorageProfiles;
Either we introduce a new overload for dataImportCronSourceFormat or a new field entirely, something along the lines of
dataImportCronSourceSpec

@Acedus
Copy link
Contributor

Acedus commented Dec 23, 2025

If we absolutely MUST go this path then I think we should start discussing an API extension for StorageProfiles; Either we introduce a new overload for dataImportCronSourceFormat or a new field entirely, something along the lines of dataImportCronSourceSpec

+1, CDI should remain as oblivious as possible to quirks of the storage providers it has to work with, this is no exception.

@noamasu noamasu force-pushed the provisioner-aware-vsc-selection-for-dic branch from 19bd66f to d406174 Compare January 14, 2026 12:29
@noamasu noamasu force-pushed the provisioner-aware-vsc-selection-for-dic branch 2 times, most recently from 5c9a8c0 to 432b4e8 Compare January 14, 2026 14:34
@kubevirt-bot kubevirt-bot added size/L and removed needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. size/XL labels Jan 14, 2026
@noamasu noamasu force-pushed the provisioner-aware-vsc-selection-for-dic branch from 432b4e8 to dff9273 Compare January 14, 2026 14:36
@coveralls
Copy link

coveralls commented Jan 14, 2026

Coverage Status

coverage: 49.452% (+0.02%) from 49.432%
when pulling bfd3fff on noamasu:provisioner-aware-vsc-selection-for-dic
into 9600ccb on kubevirt:main.

@noamasu noamasu changed the title WIP: Add provisioner-aware VolumeSnapshotClass selection for DataImportCron Add provisioner-aware VolumeSnapshotClass selection and RWO access mode for DataImportCron Jan 14, 2026
@kubevirt-bot kubevirt-bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jan 14, 2026
}

func (r *DataImportCronReconciler) newSourceDataVolume(cron *cdiv1.DataImportCron, dataVolumeName string) *cdiv1.DataVolume {
func (r *DataImportCronReconciler) newSourceDataVolume(cron *cdiv1.DataImportCron, dataVolumeName string, desiredStorageClass *storagev1.StorageClass, storageProfile *cdiv1.StorageProfile) *cdiv1.DataVolume {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do you need desiredStorageClass here?

if storageProfile.Status.SnapshotClass != nil {
return storageProfile.Status.SnapshotClass, nil
}
className, err := cc.GetSnapshotClassForSmartClone(pvc, &desiredStorageClass.Name, nil, r.log, r.client, r.recorder)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You have desiredStorageClass.Name in storageProfile, so maybe drop desiredStorageClass param?

Comment on lines 218 to 220
"pd.csi.storage.gke.io": {
"snapshot-type": "images",
},
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we want to cover it for non-hyperdisk?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

according to Snapshot frequency limits doc Snapshot restore limitation (6 per hour) is for all types, not just hyperdisk.
and this Limitations doc suggest that its not possible to snapshot RWX pvc if its an image snapshot

}

// SnapshotClassParametersForDataImportCronByProvisionerKey defines required VolumeSnapshotClass parameters for DataImportCron snapshots
var SnapshotClassParametersForDataImportCronByProvisionerKey = map[string]map[string]string{
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe generalize it like other maps here with map[string]func(sc *storagev1.StorageClass) bool, so it can check not only parameters?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is possible, i want to make sure I undersood what you mean.
you want to make the comparison and return bool?
for that,I also need the VSC to compare, something like:
func(sc *storagev1.StorageClass, vsc *snapshotv1.VolumeSnapshotClass) bool?

… annotations

Add support for provisioner-specific requirements when creating snapshots
and PVCs for DataImportCron. Some provisioners have specific needs:

- GKE Persistent Disk requires snapshot-type: images parameter in VSC
- GKE Persistent Disk and Rook Ceph RBD require RWO access mode for DataImportCron PVCs

Change details:
- Add StorageProfile annotations for DataImportCron configuration:
    cdi.kubevirt.io/useReadWriteOnceForDataImportCron: Signals RWO access mode
    cdi.kubevirt.io/snapshotClassForDataImportCron: Specifies VSC name
- Centralize provisioner-specific configuration in storagecapabilities:
    UseReadWriteOnceForDataImportCronByProvisionerKey: Maps provisioners requiring RWO
    SnapshotClassParametersForDataImportCronByProvisionerKey: Maps provisioners to VSC parameters
- StorageProfile controller automatically reconciles annotations based on provisioner
- DataImportCron controller applies RWO from StorageProfile when DV doesn't specify access modes
- DataImportCron controller selects VSC with priority: StorageProfile annotation > StorageProfile status > standard selection
- Unit tests for both controllers
- Update documentation with annotation details

Signed-off-by: Noam Assouline <[email protected]>
@noamasu noamasu force-pushed the provisioner-aware-vsc-selection-for-dic branch from dff9273 to bfd3fff Compare January 15, 2026 14:02
@akalenyu
Copy link
Collaborator

unrelated failure cluster
/retest

@kubevirt-bot
Copy link
Contributor

kubevirt-bot commented Jan 15, 2026

@noamasu: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-cdi-linter bfd3fff link false /test pull-cdi-linter
Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Copy link
Collaborator

@akalenyu akalenyu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

Comment on lines +1073 to +1075
if storageProfile.Status.SnapshotClass != nil {
return storageProfile.Status.SnapshotClass, nil
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could pass storageProfile.Status.SnapshotClass to GetSnapshotClassForSmartClone as we've been doing before, or am I missing something?


// Apply RWO access mode as default for DataImportCron (from StorageProfile annotation)
// Only applies if the DV doesn't already have AccessModes configured
if storageProfile != nil && storageProfile.Annotations != nil && storageProfile.Annotations[cc.AnnUseReadWriteOnceForDataImportCron] == "true" {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reading from a nil map is safe (storageProfile.Annotations != nil)

Comment on lines +896 to +898
if err := r.client.Get(ctx, types.NamespacedName{Name: desiredStorageClass.Name}, storageProfile); err != nil {
storageProfile = nil
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should probably just requeue on err, doesn't make sense for there to not be a profile

}

func (r *DataImportCronReconciler) createImportDataVolume(ctx context.Context, dataImportCron *cdiv1.DataImportCron) error {
func (r *DataImportCronReconciler) createImportDataVolume(ctx context.Context, dataImportCron *cdiv1.DataImportCron, desiredStorageClass *storagev1.StorageClass) error {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

picked a random line to ask this on. do you care about existing installs or just new ones? i.e. do you want to seamlessly handle someone upgrading with said type of storage and getting all dvs/snaps converted?

it's fine if not, but if you do, that would require some special care

Comment on lines +1068 to +1071
if storageProfile.Annotations != nil {
if vscName := storageProfile.Annotations[cc.AnnSnapshotClassForDataImportCron]; vscName != "" {
return &vscName, nil
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably want to unit test this, especially the ann taking precedence over storageProfile.Status.SnapshotClass

@akalenyu
Copy link
Collaborator

/test pull-containerized-data-importer-e2e-nfs
/test pull-containerized-data-importer-e2e-ceph
@noamasu note linter failure is real

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dco-signoff: yes Indicates the PR's author has DCO signed all their commits. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/L

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants