Skip to content

Consistency issues due to the use of mount binds #74

Open
@raskyld

Description

@raskyld

Hello,

We ended-up in a really edgy problem with our https://github.com/cert-manager/csi-driver-spiffe.
Basically, unlike most other Kubernetes CSI drivers, it uses a first tmpfs that will contains both metadata for volumes and SPIFFE SVID certificates (x509 certificates issued by cert-manager). Then, it answers to NodePublishVolume CSI RPC requests by making a mount with bind + ro from this tmpfs to the target_path given by the Container Orchestrator (in our case Kube).

The issue is that, on restart or failure, the bind no longer point to the valid tmpfs directory, we end up with a never renewed certificates from the POV of pods. One could say, it's fine if the instance of csi-driver-spiffe is never rebooted, you never have to care about that. But the issue is a bit more complex than that!

Since this driver uses RequiresRepublish, kubelet will issue A LOT of repeated calls to NodePublishVolume and any failure will lead to an unmount, but this will not be propagated to the mount_namespace of the running pods!. Hence, requiring users to restart the pod manually.

This is a real problem when used with SPIFFE SVID since they are typically short-lived and rotated often. A failure in this critical rotation mechanism leads to major workload disruptions.

I am revamping parts of this library in a fork to:

  1. Split metadata and data: metadata ends up in a local tmpfs and data gets its own tmpfs directly mounted at target_path where we will perform atomic updates. No more mount binds so it is simpler to manage.
  2. If the target_path is already correctly mounted, avoid unmount or doing any destructive changes since the running pods wouldn't see that.
  3. Improve the IsMounted detection to ensure it's not just any mount point but one as expected by this library.
  4. Add some metrics / stats collection.
  5. Change error reporting in CSI RPC since they have side-effects, please see: Kubelet deletes CSI mount points if a "requiresrepublish" call to NodePublishVolume returns an error kubernetes/kubernetes#121271 (comment)

Please, tell me if any of those changes seems too risky / if we have a good reason to have a mount bind instead of two separate mounts etc..

Also, tell me if you are interested in getting that merged upstream :)

Thanks 🙏

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions