Skip to content

Conversation

@Gerrit91
Copy link
Contributor

@Gerrit91 Gerrit91 commented Jan 7, 2026

Description

Looks like we forgot to set this variable in the way it was done before the refactoring. Closes #140.

References:

@Gerrit91
Copy link
Contributor Author

Gerrit91 commented Jan 7, 2026

We should add a test case that ensures that these things cannot happen. The test should write data, then unmount and remount to another pod and verify that the data is still present.

@Gerrit91
Copy link
Contributor Author

Gerrit91 commented Jan 7, 2026

We have these sorts of tests in the metal-stack integration suite but the v0.8.0 csi-driver-lvm release did not go into metal-stack v0.22.2, so we did not see it right away. :(

@horazont
Copy link

horazont commented Jan 7, 2026

We did see it unfortunately (the very hard way).

The fix looks sensible to me and in line with what I would've proposed myself. In the spirit of not having more users burned by 0.8.x, would it make sense to ship this fix as-is and cater for the tests later?

@Gerrit91
Copy link
Contributor Author

Gerrit91 commented Jan 7, 2026

I am sorry to hear this. I was hoping that nobody jumps right onto the pre-release version with critical stuff. :(

Unfortunately, @ostempel is still on vacation. @vknabel Do you think we can merge this and make another pre-release with v0.8.0 (alternatively v0.8.1 and withdraw v0.8.0 entirely). Then, add a warning that this release is still being evaluated and we remove the pre-release flag after our integration tests have passed?

@horazont
Copy link

horazont commented Jan 7, 2026

We didn't notice it was a pre-release. We installed via helm from the metal-stack repo w/o specifying the version.

This seems in line with what helm says:

$ helm show values metal-stack/csi-driver-lvm
lvm:
  # This one you should change
  devicePattern: /dev/nvme[0-9]n[0-9]

  # You will want to change this for read-only filesystems
  # For example, in Talos OS, set this to "/var/etc/lvm"
  hostWritePath: /etc/lvm

  # these are primariliy for testing purposes
  vgName: csi-lvm
  driverName: lvm.csi.metal-stack.io
  storageClassStub: csi-driver-lvm
  logLevel: info

rbac:
  create: true
  pspEnabled: true

## enable, if you want to install storage classes with v0.3.x backward-compatible `csi-lvm-sc-*` names
compat03x: false

pluginImage:
  repository: ghcr.io/metal-stack/csi-driver-lvm
  tag: v0.8.0
  pullPolicy: IfNotPresent

sidecarImages:
  attacher: k8s.gcr.io/sig-storage/csi-attacher:v3.5.0
  livenessprobe: k8s.gcr.io/sig-storage/livenessprobe:v2.7.0
  provisioner: k8s.gcr.io/sig-storage/csi-provisioner:v3.2.1
  registrar: k8s.gcr.io/sig-storage/csi-node-driver-registrar:v2.5.1
  resizer: k8s.gcr.io/sig-storage/csi-resizer:v1.6.0

kubernetes:
  kubeletPath: /var/lib/kubelet

storageClasses:
  linear:
    enabled: true
    additionalAnnotations: []
    # this might be used to mark one of the StorageClasses as default:
    # storageclass.kubernetes.io/is-default-class: "true"
    reclaimPolicy: Delete
  striped:
    enabled: true
    additionalAnnotations: []
    reclaimPolicy: Delete
  mirror:
    enabled: true
    additionalAnnotations: []
    reclaimPolicy: Delete

nodeSelector:
  # The plugin daemonset will run on all nodes if it has a toleration,
  # so it is not necessary to set a nodeSelector for it

  # plugin:
    # node-role.kubernetes.io/master: ""
    # Key name may need to be updated to 'node-role.kubernetes.io/control-plane'
    # in the future

  # The provisioner has an affinity for nodes with a plugin pod,
  # but since that's a daemonset, we allow more fine-grained node selection

  provisioner:
    # node-role.kubernetes.io/master: ""
    # Key name may need to be updated to 'node-role.kubernetes.io/control-plane'
    # in the future

tolerations:
  plugin:
  # - key: node-role.kubernetes.io/master
  #   operator: Exists
  #   effect: NoSchedule
  # - key: node-role.kubernetes.io/control-plane
  #   operator: Exists
  #   effect: NoSchedule
  provisioner:
  # - key: node-role.kubernetes.io/master
  #   operator: Exists
  #   effect: NoSchedule
  # - key: node-role.kubernetes.io/control-plane
  #   operator: Exists
  #   effect: NoSchedule
pluginImage:
  repository: ghcr.io/metal-stack/csi-driver-lvm
  tag: v0.8.0
  pullPolicy: IfNotPresent

EDIT: according to a colleague, this was what we did (let's not discuss the choice of the kube-system namespace here ;-)) precisely:

$ helm repo add csi-driver-lvm https://helm.metal-stack.io 
$ helm upgrade -n kube-system --values lvm-values.yaml csi-driver-lvm csi-driver-lvm/csi-driver-lvm 

and that seems to have landed us with 0.8.0 …

EDIT2: and yeah, we should have specified the known-good version we use everywhere instead of going with latest here in that particular case, but … well, maybe it saves someone else :-).

@Gerrit91
Copy link
Contributor Author

Gerrit91 commented Jan 7, 2026

I see, thanks for explaining it. It was surely a mistake that the helm-chart was released already (as v0.8.0 was pre-release). Most of our users reference integration-tested metal-stack releases, which then point to the tested versions. Therefore this version has only seen our test environments by now. For this repository in particular, we have to take more care of the release cycle because it's used by so many people who do not know or run metal-stack. So, my apologies for this. We will fix this tomorrow.

kube-system is by the way just fine. We also use this namespace in our K8s clusters and so do CSI drivers on GKE clusters...

@horazont
Copy link

horazont commented Jan 7, 2026

Thanks for the feedback, for taking care of this and for taking this seriously! It's much appreciated.

For now, we rolled back to 0.5.2.

@Gerrit91 Gerrit91 marked this pull request as ready for review January 8, 2026 09:01
@Gerrit91 Gerrit91 requested a review from a team as a code owner January 8, 2026 09:01
@Gerrit91 Gerrit91 requested review from ostempel and vknabel January 8, 2026 09:01
@Gerrit91
Copy link
Contributor Author

Gerrit91 commented Jan 8, 2026

I decided to implement a test first because we really have to ensure that this fix works, otherwise it won't help anyone. The test is now in place.

Copy link

@vknabel vknabel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great work!
I'd vote for the quick release of v0.8.1 and withdrawing the v0.8.0 release.
We also should add a prominent warning at the top of v0.8.1 that advises the upgrade for adopters of v0.8.0.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

LV formated each time they are mounted (DATA LOSS)

4 participants