Skip to content

Prioritize lower DeviceNumber in ipv4 assignment #3300

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

sharnoff
Copy link

@sharnoff sharnoff commented Jun 2, 2025

What type of PR is this?

Improvement

Which issue does this PR fix?

I didn't see a related github issue; I've provided context here:

Currently (*DataStore).AssignPodIPv4Address() iterates through the eniPool map when searching for an IP address to assign to the pod. Because map iteration order is random, this means that new assignments will tend to be distributed evenly across all ENIs with sufficient available IPs.

When WARM_ENI_TARGET is set (and not {WARM,MINIMUM}_IP_TARGET), IPs are only returned to the subnet when the entire ENI can be removed, which can only happen when it's completely unused.

At work, our workloads mean that even distribution of new assignments makes it very unlikely for any ENI to become unused, once added. This in turn means that ipamd almost never returns IPs to the subnet, causing us to have many more IPs allocated than can ever be used in practice.

What does this PR do / Why do we need it?

This PR changes the iteration order over ENIs in (*DataStore.AssignPodIPv4Address() from random to in order of increasing device number.

This is worthwhile because ipamd is otherwise unlikely to free IPs under the behavior from WARM_ENI_TARGET.

That said, there's potentially good reasons to maintain randomness here -- either by default or opt-in. For example, random assignment to ENIs may better distribute load, helping with throughput.

Please let me know if you'd like to see this made configurable - I wanted to start simple for now.

Testing done on this change

Output from go test $(go list ./... | grep -v '/amazon-vpc-cni-k8s/test/integration/')
?   	github.com/aws/amazon-vpc-cni-k8s/cmd/aws-k8s-agent	[no test files]
ok  	github.com/aws/amazon-vpc-cni-k8s/cmd/aws-vpc-cni	(cached)
?   	github.com/aws/amazon-vpc-cni-k8s/cmd/aws-vpc-cni-init	[no test files]
?   	github.com/aws/amazon-vpc-cni-k8s/cmd/cni-metrics-helper	[no test files]
ok  	github.com/aws/amazon-vpc-cni-k8s/cmd/cni-metrics-helper/metrics	(cached)
ok  	github.com/aws/amazon-vpc-cni-k8s/cmd/egress-cni-plugin	0.005s
ok  	github.com/aws/amazon-vpc-cni-k8s/cmd/egress-cni-plugin/snat	(cached)
?   	github.com/aws/amazon-vpc-cni-k8s/cmd/grpc-health-probe	[no test files]
ok  	github.com/aws/amazon-vpc-cni-k8s/cmd/routed-eni-cni-plugin	(cached)
ok  	github.com/aws/amazon-vpc-cni-k8s/cmd/routed-eni-cni-plugin/driver	(cached)
?   	github.com/aws/amazon-vpc-cni-k8s/cmd/routed-eni-cni-plugin/driver/mocks	[no test files]
?   	github.com/aws/amazon-vpc-cni-k8s/pkg/apis/crd/v1alpha1	[no test files]
ok  	github.com/aws/amazon-vpc-cni-k8s/pkg/awsutils	(cached)
ok  	github.com/aws/amazon-vpc-cni-k8s/pkg/awsutils/awssession	(cached)
?   	github.com/aws/amazon-vpc-cni-k8s/pkg/awsutils/mocks	[no test files]
?   	github.com/aws/amazon-vpc-cni-k8s/pkg/cninswrapper	[no test files]
?   	github.com/aws/amazon-vpc-cni-k8s/pkg/cninswrapper/mock_ns	[no test files]
ok  	github.com/aws/amazon-vpc-cni-k8s/pkg/ec2metadatawrapper	(cached)
?   	github.com/aws/amazon-vpc-cni-k8s/pkg/ec2metadatawrapper/mocks	[no test files]
ok  	github.com/aws/amazon-vpc-cni-k8s/pkg/ec2wrapper	(cached)
?   	github.com/aws/amazon-vpc-cni-k8s/pkg/ec2wrapper/mocks	[no test files]
ok  	github.com/aws/amazon-vpc-cni-k8s/pkg/eniconfig	(cached)
?   	github.com/aws/amazon-vpc-cni-k8s/pkg/eniconfig/mocks	[no test files]
?   	github.com/aws/amazon-vpc-cni-k8s/pkg/grpcwrapper	[no test files]
?   	github.com/aws/amazon-vpc-cni-k8s/pkg/grpcwrapper/mocks	[no test files]
?   	github.com/aws/amazon-vpc-cni-k8s/pkg/hostipamwrapper	[no test files]
?   	github.com/aws/amazon-vpc-cni-k8s/pkg/hostipamwrapper/mocks	[no test files]
ok  	github.com/aws/amazon-vpc-cni-k8s/pkg/ipamd	(cached)
ok  	github.com/aws/amazon-vpc-cni-k8s/pkg/ipamd/datastore	(cached)
?   	github.com/aws/amazon-vpc-cni-k8s/pkg/iptableswrapper	[no test files]
?   	github.com/aws/amazon-vpc-cni-k8s/pkg/iptableswrapper/mocks	[no test files]
?   	github.com/aws/amazon-vpc-cni-k8s/pkg/ipwrapper	[no test files]
?   	github.com/aws/amazon-vpc-cni-k8s/pkg/ipwrapper/mocks	[no test files]
ok  	github.com/aws/amazon-vpc-cni-k8s/pkg/k8sapi	(cached)
?   	github.com/aws/amazon-vpc-cni-k8s/pkg/netlinkwrapper	[no test files]
?   	github.com/aws/amazon-vpc-cni-k8s/pkg/netlinkwrapper/mock_netlink	[no test files]
?   	github.com/aws/amazon-vpc-cni-k8s/pkg/netlinkwrapper/mocks	[no test files]
?   	github.com/aws/amazon-vpc-cni-k8s/pkg/netlinkwrapper/mocks_link	[no test files]
ok  	github.com/aws/amazon-vpc-cni-k8s/pkg/networkutils	(cached)
?   	github.com/aws/amazon-vpc-cni-k8s/pkg/networkutils/mocks	[no test files]
?   	github.com/aws/amazon-vpc-cni-k8s/pkg/nswrapper	[no test files]
?   	github.com/aws/amazon-vpc-cni-k8s/pkg/nswrapper/mocks	[no test files]
?   	github.com/aws/amazon-vpc-cni-k8s/pkg/procsyswrapper	[no test files]
?   	github.com/aws/amazon-vpc-cni-k8s/pkg/procsyswrapper/mocks	[no test files]
ok  	github.com/aws/amazon-vpc-cni-k8s/pkg/publisher	(cached)
?   	github.com/aws/amazon-vpc-cni-k8s/pkg/publisher/mock_publisher	[no test files]
?   	github.com/aws/amazon-vpc-cni-k8s/pkg/rpcwrapper	[no test files]
?   	github.com/aws/amazon-vpc-cni-k8s/pkg/rpcwrapper/mocks	[no test files]
ok  	github.com/aws/amazon-vpc-cni-k8s/pkg/sgpp	(cached)
?   	github.com/aws/amazon-vpc-cni-k8s/pkg/typeswrapper	[no test files]
?   	github.com/aws/amazon-vpc-cni-k8s/pkg/typeswrapper/mocks	[no test files]
ok  	github.com/aws/amazon-vpc-cni-k8s/pkg/utils/cniutils	(cached)
ok  	github.com/aws/amazon-vpc-cni-k8s/pkg/utils/eventrecorder	(cached)
ok  	github.com/aws/amazon-vpc-cni-k8s/pkg/utils/logger	(cached)
ok  	github.com/aws/amazon-vpc-cni-k8s/pkg/utils/retry	(cached)
?   	github.com/aws/amazon-vpc-cni-k8s/pkg/utils/ttime	[no test files]
?   	github.com/aws/amazon-vpc-cni-k8s/pkg/utils/ttime/mocks	[no test files]
?   	github.com/aws/amazon-vpc-cni-k8s/pkg/version	[no test files]
?   	github.com/aws/amazon-vpc-cni-k8s/pkg/vethwrapper	[no test files]
?   	github.com/aws/amazon-vpc-cni-k8s/pkg/vethwrapper/mocks	[no test files]
ok  	github.com/aws/amazon-vpc-cni-k8s/pkg/vpc	(cached)
?   	github.com/aws/amazon-vpc-cni-k8s/rpc	[no test files]
?   	github.com/aws/amazon-vpc-cni-k8s/rpc/mocks	[no test files]
?   	github.com/aws/amazon-vpc-cni-k8s/scripts	[no test files]
?   	github.com/aws/amazon-vpc-cni-k8s/test/framework	[no test files]
?   	github.com/aws/amazon-vpc-cni-k8s/test/framework/controller	[no test files]
?   	github.com/aws/amazon-vpc-cni-k8s/test/framework/helm	[no test files]
?   	github.com/aws/amazon-vpc-cni-k8s/test/framework/resources/agent	[no test files]
?   	github.com/aws/amazon-vpc-cni-k8s/test/framework/resources/aws	[no test files]
?   	github.com/aws/amazon-vpc-cni-k8s/test/framework/resources/aws/services	[no test files]
?   	github.com/aws/amazon-vpc-cni-k8s/test/framework/resources/aws/utils	[no test files]
?   	github.com/aws/amazon-vpc-cni-k8s/test/framework/resources/k8s	[no test files]
?   	github.com/aws/amazon-vpc-cni-k8s/test/framework/resources/k8s/manifest	[no test files]
?   	github.com/aws/amazon-vpc-cni-k8s/test/framework/resources/k8s/resources	[no test files]
?   	github.com/aws/amazon-vpc-cni-k8s/test/framework/resources/k8s/utils	[no test files]
?   	github.com/aws/amazon-vpc-cni-k8s/test/framework/utils	[no test files]
ok  	github.com/aws/amazon-vpc-cni-k8s/utils	(cached)
?   	github.com/aws/amazon-vpc-cni-k8s/utils/cp	[no test files]
?   	github.com/aws/amazon-vpc-cni-k8s/utils/imds	[no test files]
?   	github.com/aws/amazon-vpc-cni-k8s/utils/prometheusmetrics	[no test files]

Will this PR introduce any new dependencies?

No

Will this break upgrades or downgrades? Has updating a running cluster been tested?

I don't expect it will cause issues with upgrades/downgrades, although we haven't yet tried it on a running cluster.

Does this change require updates to the CNI daemonset config files to work?

No

Does this PR introduce any user-facing change?

This PR intends to have a user-visible effect, but shouldn't have any change in functionality. I'll leave it up to your judgement.


By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

Currently (*DataStore).AssignPodIPv4Address() iterates through the
eniPool map when searching for an IP address to assign to the pod.
Because map iteration order is random, this means that new assignments
will tend to be distributed evenly across all ENIs with sufficient
available IPs.

When WARM_ENI_TARGET is set (and not {WARM,MINIMUM}_IP_TARGET), IPs are
only returned to the subnet when the entire ENI can be removed, which
can only happen when it's completely unused.

For our workloads, even distribution of new assignments makes it very
unlikely for any ENI to become unused, once added. This in turn means
that ipamd almost never returns IPs to the subnet, causing us to have
many more IPs allocated than can ever be used in practice.
@sharnoff sharnoff requested a review from a team as a code owner June 2, 2025 10:52
@jayanthvn
Copy link
Contributor

Thanks for opening the PR. We will review and get back to you. I understand the concern, since we randomly pick ENIs, we might end up allocating more ENIs with warm eni target. This was discussed before but the intention of picking random ENI was to distribute the bandwidth among the ENIs. This would ideally launch all pods behind a single ENI and for high bandwidth applications we will end up throttling the per ENI bandwidth.

Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR modifies the ENI IP assignment behavior by sorting ENIs in ascending order by their device number during IP allocation. This adjustment aims to prioritize lower device numbers, making it easier for ipamd to free IPs and improve resource utilization.

  • Added sorting of ENIs using slices.SortFunc.
  • Ensured that lower device numbers are prioritized for IP address allocation.
Comments suppressed due to low confidence (1)

pkg/ipamd/datastore/data_store.go:690

  • Consider adding unit tests to validate that ENIs are correctly sorted in ascending order by device number to ensure the intended allocation behavior.
slices.SortFunc(enis, func(x, y *ENI) int { return x.DeviceNumber - y.DeviceNumber })

@sharnoff
Copy link
Author

Hey @jayanthvn, any update? I'm happy to adjust the PR to make the change configurable, so that users who need the networking load spread across multiple ENIs are able to retain the current behavior.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants