manager/csi/manager.go: getPlugin: Issue with handling plugin names related to image tags

I was working on debugging compatibility with the CephCSI driver and stumbled upon an issue in how plugin names and aliases are handled.

Installed the driver using this cmd: 
```
docker plugin rm rbd.csi.ceph.com --force && \
docker plugin install <privaterepo>/ceph-csi-swarm/cephcsi-aio:canary \
--grant-all-permissions \
NODE_ID=test-docker-1 \
CEPHCSI_VERBOSITY=10 \
DEBUG_ENTRYPOINT=true \
--alias rbd.csi.ceph.com
```

Created CSI volume and a service using it fine. Basically, everything worked up until volume publishing, where I was met with an error message from Docker stating:
`CSI node ID not found for given Swarm node ID`.

Looking at the code:

That log only triggers if this lookup fails:
```
csiNodeID := p.swarmToCSI[nodeID]
if csiNodeID == "" { ... }
```

There's two maps in the plugin object `swarmToCSI`  and `csiToSwarm`.
These are correctly populated upon daemon startup, which I checked using debug statements:

<img width="558" height="56" alt="Image" src="https://github.com/user-attachments/assets/0d2cf053-cf66-4fe7-9e7f-0628375eff26" />
 
At the point in time where these maps are accessed, they seem to be emtpy:

<img width="496" height="49" alt="Image" src="https://github.com/user-attachments/assets/bd0efcd8-05a0-4b39-97ca-7cb50f5209b0" />

I further tested this by logging the ptr of the underlying `plugin` object which confirmed there were two "virtual" instances of the same plugin at runtime:
At start:

<img width="233" height="21" alt="Image" src="https://github.com/user-attachments/assets/faf59bad-0ec8-402c-b7c0-7d1af3be2304" />

At use:

<img width="233" height="21" alt="Image" src="https://github.com/user-attachments/assets/64b5fa14-a942-4f4f-9f25-7a3f01eb0ab9" />
 
Looking at where plugin object instances are created, [manager/csi/manager.go](https://github.com/moby/swarmkit/blob/3a23580599a45034467695b859a0e794e62bddfa/manager/csi/manager.go#L465), and the usages thereof, it seems like handleNode gets the plugin's name from `info.PluginName`, (https://github.com/moby/swarmkit/blob/3a23580599a45034467695b859a0e794e62bddfa/manager/csi/manager.go#L406) whereas all other codepaths get it from `v.Spec.Driver.Name`.

This effectively causes two driver objects to be created from one installed driver, one with the tag in the alias and one without, where one instance is correctly populated with values at startup, but the other unpopulated instance is what is actually used later in the code, causing the CSI driver to fail at NodePublishVolume.

---

To fix this I modified the getPlugin method to normalize/canonicalize plugin names correctly.
https://github.com/ppignet/swarmkit/tree/csi/manager_normalize_plugin_name

I confirmed that this fixes the issue by veryfing the ptr is now identical:

upon startup:
<img width="235" height="32" alt="Image" src="https://github.com/user-attachments/assets/a30feb00-10bf-4118-8f4c-77a1c5a5cea2" />

upon usage:

<img width="263" height="24" alt="Image" src="https://github.com/user-attachments/assets/85202a43-ead6-40cf-8bc7-e0e2b4c005b6" />

<img width="607" height="23" alt="Image" src="https://github.com/user-attachments/assets/76b763f9-920f-4197-9fc9-8fce304d9f37" />

..., and now being able to spin up many volumes and services without errors anymore! 😺 

Lmk if this is a non-issue and just me doing something wrong. But I think this addition would make handling more fail-proof in any case.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

manager/csi/manager.go: getPlugin: Issue with handling plugin names related to image tags #3215

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

manager/csi/manager.go: getPlugin: Issue with handling plugin names related to image tags #3215

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions