-
Notifications
You must be signed in to change notification settings - Fork 361
Add In Memory Cache of Node/CSINode Info for Long Provisioning to Prevent Volume Leak #1413
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Add In Memory Cache of Node/CSINode Info for Long Provisioning to Prevent Volume Leak #1413
Conversation
8e951c6
to
4fdda4f
Compare
/assign @msau42 |
@@ -1486,31 +1486,19 @@ func (ctrl *ProvisionController) provisionClaimOperation(ctx context.Context, cl | |||
return ProvisioningFinished, errStopProvision | |||
} | |||
|
|||
var selectedNode *v1.Node | |||
var selectedNodeName string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jsafrane Should we just make a new major version instead of making it an option for other provisioners?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a breaking change, so yes, it will require a new major version.
I don't see where you make it an option, this PR looks very backward incompatible. And I am not against.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The alternative, if we think non-csi provisioners still want the old behavior, is to add the new changes as a feature gate. I'm not aware of any non-csi provisioners that support topology though and they would still run into the topology bug if we keep the old way.
So I am also in favor of just making a new major.
@@ -417,7 +417,7 @@ func main() { | |||
if supportsMigrationFromInTreePluginName != "" { | |||
provisionerOptions = append(provisionerOptions, controller.AdditionalProvisionerNames([]string{supportsMigrationFromInTreePluginName})) | |||
} | |||
|
|||
pvcNodeStore := ctrl.NewInMemoryStore() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the driver doesn't support topology, then I think the cache is not needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So we need to add Topology check for all the cache code.
// a TopologyInfo object by its name. | ||
// The name is pvcNamespace/pvcName | ||
type TopologyProvider interface { | ||
GetByName(name string) (*TopologyInfo, error) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To make it very clear what the key is, what if we make it v1.PersistentVolumeClaim*
instead of a string?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if PVC is deleted? Also currently we only pass in pvcName and pvcNamespace in the GenerateAccessibilityRequirements. Can changing to a pointer to PVC cause problems?
pkg/controller/cache.go
Outdated
// First, check if the key exists to provide a helpful error. | ||
_, found := s.data[name] | ||
if !found { | ||
return fmt.Errorf("cannot delete: object with name '%s' not found", name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to error if it doesn't exist? I don't think the caller will do anything other than ignore the error. If you do want to return an error, I would make an explicit error type for NotFoundError, so that the caller can distinguish it from other errors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 for returning no error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to just log a warning and not returning error?
pkg/controller/topology.go
Outdated
return nil, err | ||
} | ||
if len(cacheInfo.TopologyKeys) == 0 { | ||
return nil, fmt.Errorf("no topology key found in the CSINode in memory cache %s", nodeName) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need a special error message just for no keys/label from cache? We already have a warning message that we're going to lookup the cache.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do not need the special error message but we should return an error. Maybe errors.New("")?
pkg/controller/topology.go
Outdated
if selectedNode != nil { | ||
selectedCSINode, err = getSelectedCSINode(csiNodeLister, selectedNode) | ||
var topologyKeys []string | ||
if len(selectedNodeName) > 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This logic is very hard to follow with the cache lookups and updates being interspersed with condition checks. I think it would be easier to read if we wrapped the informer/cache lookup logic in one call, and kept the core GenerateAccessibility
function clean with just business logic. Something like:
func topologyLookup(selectedNodeName) {
csiNode, err1 := getSelectedCSINode(csiNodeLister, selectedNodeName)
node,err2 := getNode(nodeLister, selectedNodeName)
if csiNode == nil || node == nil {
topologyInfo = cache.getTopologyInfo(pvc)
} else {
topologyInfo = { getTopologyKeys(csiNode), getNodeLabels(node, topologyKeys)}
cache.addToplogyInfo(topologyInfo)
}
return topologyInfo
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can also consider looking up cache first, then if it doesn't exist in cache, then lookup in informers. I think switching that logic will solve the immediate binding scenario I outlined in #1413 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer checking the informers first, the node might have gotten new labels as the CSI driver just got installed there.
} | ||
selectedNodeName = nodeName | ||
} else { | ||
logger.Info("No selected node annotation") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will spam for drivers that don't support topology or use Immediate binding
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay so we just do not log anything
pkg/controller/topology.go
Outdated
return term, node.Labels, false | ||
} | ||
} else { | ||
// nodeLister is nil, read from the cache directly |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this possible? I think nodeLister is required.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The original logic in the lib is:
if ctrl.nodeLister != nil {
selectedNode, err = ctrl.nodeLister.Get(nodeName)
} else {
selectedNode, err = ctrl.client.CoreV1().Nodes().Get(ctx, nodeName, metav1.GetOptions{}) // TODO (verult) cache Nodes
}
So I was thinking nodeLister can be nil? I am not sure when will it occur though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the csi-provisioner it is mandatory as long as the plugin supports topology: https://github.com/kubernetes-csi/external-provisioner/blob/master/cmd/csi-provisioner/csi-provisioner.go#L339
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay then we can remove the nodeLister nil check
pkg/controller/topology.go
Outdated
klog.Warningf("no node labels for node %s found in memory cache", nodeName) | ||
return nil, nil, true | ||
} | ||
for _, key := range topologyKeys { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This logic looks repeated 3 times in this function. It would be better to share it across all 3 branches.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed one for the nodeLister is nil code. However the rest two, the first one read nodeLabels from Cache, the other one read nodeLabels from the node. I think we can make them to a processTerm func.
For your e2e test case steps, can you also test deleting the PVC object along with the Pod and Node to trigger the leak? I guess the leak is triggered by 2 scenarios:
Can you verify you can repro the issue with both those scenarios, and that the fix works in both those scenarios? I'm trying to think if AllowedTopologies or Immediate binding mode would be impacted. AllowedTopologies shouldn't be impacted because we grab the topology directly from it. I think Immediate binding could be impacted in a scenario like:
|
Also in the testing steps, with the fix, verify that the PV object is created and then deleted when the disk gets deleted. |
pkg/controller/cache.go
Outdated
|
||
// TopologyProvider is an interface that defines the behavior for looking up | ||
// a TopologyInfo object by its name. | ||
// The name is pvcNamespace/pvcName |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMHO the key should be PVC UID.
Consider this scenario:
- User creates a PVC
- The provisioning takes a long time, i.e. many timeouts + retries.
- User deletes the PVC and creates a new one, with the same name, but different StorageClass.
- The original PVC is still being provisioned. Despite it has been deleted from the API server, the provisioner waits for a final error.
- The new PVC should be provisioned too.
-> Two PVCs have the same name and are being provisioned, but they have a different StorageClass and thus different topology cache.
PVC UID will be unique for these PVCs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And PVC UID guarantees uniqueness as well.
pkg/controller/cache.go
Outdated
// First, check if the key exists to provide a helpful error. | ||
_, found := s.data[name] | ||
if !found { | ||
return fmt.Errorf("cannot delete: object with name '%s' not found", name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 for returning no error.
@@ -1486,31 +1486,19 @@ func (ctrl *ProvisionController) provisionClaimOperation(ctx context.Context, cl | |||
return ProvisioningFinished, errStopProvision | |||
} | |||
|
|||
var selectedNode *v1.Node | |||
var selectedNodeName string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a breaking change, so yes, it will require a new major version.
I don't see where you make it an option, this PR looks very backward incompatible. And I am not against.
pkg/controller/controller.go
Outdated
mayReschedule, | ||
state, | ||
err) | ||
if IsFinalError(err) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understand it correctly, as long as the PVC exists and the provisioner lib calls Provision(), the external-provisioner re-loads the topology from the API server, adds it to the cache and deletes it from the cache on every final error.
It's actually a smart idea that simplifies the cache cleanup. It's just not very obvious from the cache name and its comments. I somehow expected it caches the topology forever and deletes it after PVC is deleted. A comment on the right place (the struct
definition?) would help.
pkg/controller/topology.go
Outdated
if selectedNode != nil { | ||
selectedCSINode, err = getSelectedCSINode(csiNodeLister, selectedNode) | ||
var topologyKeys []string | ||
if len(selectedNodeName) > 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer checking the informers first, the node might have gotten new labels as the CSI driver just got installed there.
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: sunnylovestiramisu The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
abab6ad
to
98536bb
Compare
@sunnylovestiramisu: The following test failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
@@ -50,22 +51,24 @@ func (s *InMemoryStore) Add(name string, info *TopologyInfo) { | |||
|
|||
// Delete implements the TopologyProvider interface. | |||
// It uses the built-in delete() function to remove the item from the map. | |||
// The entry is deleted when provision succeeds or returns a final error. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is tue regardless of implementation. I would probably put it as a comment in the interface
@@ -692,7 +692,7 @@ func (p *csiProvisioner) prepareProvision(ctx context.Context, claim *v1.Persist | |||
requirements, err := GenerateAccessibilityRequirements( | |||
p.client, | |||
p.driverName, | |||
claim.Namespace, | |||
string(claim.UID), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need to convert the key to a string? Can we use the direct type as the key?
@@ -105,6 +106,78 @@ func SupportsTopology(pluginCapabilities rpc.PluginCapabilitySet) bool { | |||
utilfeature.DefaultFeatureGate.Enabled(features.Topology) | |||
} | |||
|
|||
func topologyLookup( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not quite what I meant about the wrapper function.
I think it would be simpler to understand the code if things like obj lookup (whether from informer or cache) is abstracted from business logic (how to behave when there are no keys, or strictTopology mode is set, etc).
So I think a better structure for GenerateAccessiblityRequirements
would be something like:
func GenerateAccessibilityRequirements(...) (...) {
// This returns the information from informer or cache, the caller doesn't care
topologyInfo = topologyLookup(selectedNode)
// Nothing in the logic after this should care if the information came from the cache or informer
if len(topologyKeys) == 0) {
// business logic to handle it
}
if strictTopology {
// business logic to handle it
}
// everything else
}
What type of PR is this?
/kind bug
What this PR does / why we need it:
This is the option 4 discussed in Leaked Volume Fix in external-provisioner Google doc.
The short term fix does not solve the issue of csi-provisioner restarting during the long provisioning time, however it is still an improvement. Also the short term fix does not have conflict with other long term fixes if we want to revisit in the future.
Which issue(s) this PR fixes:
kubernetes-sigs/sig-storage-lib-external-provisioner#152
Special notes for your reviewer:
Making changes in vendor/sigs.k8s.io/sig-storage-lib-external-provisioner/v11 to get initial review, I will move the change to sig-storage-lib-external-provisioner and bump up the sig-storage-lib-external-provisioner major version once this review is in good shape.
Tested via the following steps.
kubetest --build --up
from a k/k repocsi-gce-pd-controller-8dffb78ff-2xzbq is on node e2e-test-songsunny-minion-group-kncr, for the following testing pod, specify a nodeSelector that is NOT on node e2e-test-songsunny-minion-group-kncr
5. Apply the following objects:
Does this PR introduce a user-facing change?: