-
Notifications
You must be signed in to change notification settings - Fork 782
Enhanced Subnet Discovery #3380
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
96310d6
to
0a64e10
Compare
// Mark primary ENI as excluded if primary subnet is excluded | ||
if eni == primaryENI && c.isPrimarySubnetExcluded { | ||
log.Infof("Marking primary ENI %s as excluded from pod IP allocation", eni) | ||
err := c.dataStoreAccess.GetDataStore(eniMetadata.NetworkCard).SetENIExcludedForPodIPs(eni, true) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Check if eniMetadata.NetworkCard
is the same network card as 0.
f1898c7
to
7945b5a
Compare
name: "IPv6 enabled with subnet discovery and mixed IPv4/IPv6 subnets", | ||
subnets: []ec2types.Subnet{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to check edge case in code if customer is on IPv4 cluster and the code picks up an IPv6 subnet because that's all they had tagged in the console and vice versa.
903a226
to
0368ae0
Compare
pkg/awsutils/awsutils.go
Outdated
// When subnet discovery is disabled, check if primary subnet is excluded | ||
excluded, checkErr := cache.isPrimarySubnetExcluded() | ||
if checkErr != nil { | ||
// If we can't determine exclusion status, log warning and proceed | ||
log.Warnf("Failed to check if primary subnet is excluded: %v. Proceeding with ENI creation attempt.", checkErr) | ||
} else if excluded { | ||
// Primary subnet is explicitly excluded | ||
err = errors.New("primary subnet is tagged with kubernetes.io/role/cni=0 and subnet discovery is disabled - no valid subnets available for ENI creation") | ||
log.Error(err.Error()) | ||
return "", err | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This checks if Cx has a tagged subnet, but have not enable subnet discovery. Do we want to do this check?
if c.enableIPv6 { | ||
assignedIPs = primaryENIInfo.AssignedIPv6Addresses() | ||
ipType = "IPv6" | ||
} else { | ||
assignedIPs = primaryENIInfo.AssignedIPv4Addresses() | ||
ipType = "IPv4" | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See if we can use hasPods()
function.
Currently hasPods()
only checks for IPv4 assigned addresses, will need to update that func to support IPv6.
// Skip ENIs that are excluded from pod IP allocation | ||
if eni.IsExcludedForPodIPs { | ||
ds.log.Debugf("Skip needs IP check for ENI %s as it is excluded from pod IP allocation", eni.ID) | ||
continue | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will skip the primary ENI in GetAllocatableENIs
function and prevent it from getting IPs assigned/allocated
0368ae0
to
ea0763f
Compare
pkg/ipamd/ipamd_test.go
Outdated
assert.False(t, wasUsed, "Primary ENI should be excluded and auto-cleanup prefixes") | ||
} | ||
|
||
func TestPrimaryENIAutoCleanupFailureHandling(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add case for IPv6 mode too
587b9f3
to
ea0763f
Compare
describeSGInput := &ec2.DescribeSecurityGroupsInput{ | ||
Filters: []ec2types.Filter{ | ||
{ | ||
Name: aws.String("vpc-id"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need support global SGs which could be in another VPCs.
pkg/awsutils/awsutils.go
Outdated
}, | ||
{ | ||
Name: aws.String("tag:" + subnetDiscoveryTagKey), | ||
Values: []string{"1"}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: let's make them to const, such as selected = 1.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
pkg/awsutils/awsutils.go
Outdated
|
||
// RefreshSGIDs retrieves security groups | ||
func (cache *EC2InstanceMetadataCache) RefreshSGIDs(mac string, dsAccess *datastore.DataStoreAccess) error { | ||
ctx := context.TODO() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should generalize the context is used across API calls.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added shared context for all functions across awsutils
and ipamd
.
pkg/awsutils/awsutils.go
Outdated
} | ||
|
||
// RefreshSGIDs retrieves security groups | ||
func (cache *EC2InstanceMetadataCache) RefreshSGIDs(mac string, dsAccess *datastore.DataStoreAccess) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What event triggers this func?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We call this here in IPAMD: https://github.com/aws/amazon-vpc-cni-k8s/blob/master/pkg/ipamd/ipamd.go#L565-L579
SInce I also made a RefreshCustomSGID
, I refactored this function to share the logic between the two.
for _, ds := range dsAccess.DataStores { | ||
eniInfos := ds.GetENIInfos() | ||
for eniID := range eniInfos.ENIs { | ||
eniIDs = append(eniIDs, eniID) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would eniIDs := append(eniIDs, ds.GetENIInfos()...)
just work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ds.GetENIInfos()
returns *datastore.ENIInfos
, and then you iterate through each ENI by going over the map
*datastore.ENIInfos.ENIs
.
So not straightforward to do it in one line unless we want to do something like this.
eniIDs := slices.Collect(maps.Keys(ds.GetENIInfos().ENIs))
} | ||
log.Infof("Creating ENI with security groups: %v in subnet: %s", input.Groups, aws.ToString(input.SubnetId)) | ||
log.Infof("Creating ENI with security groups: %v in subnet: %s", input.Groups, aws.ToString(subnet.SubnetId)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks we added many more log lines, can you recheck them to maintain them to minimal level.
pkg/awsutils/awsutils.go
Outdated
|
||
// If no valid subnets found, return appropriate error | ||
if !validSubnetsFound { | ||
err = errors.New("no valid subnets available for ENI creation - all subnets are either not tagged or tagged with kubernetes.io/role/cni=0") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you check the similar pattern to use one line return instead of three lines.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
|
||
eni.IsExcludedForPodIPs = excluded | ||
if excluded { | ||
ds.log.Infof("ENI %s marked as excluded from pod IP allocation", eniID) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's avoid logging if possible especially when the code is under lock mechanism.
@@ -636,6 +679,12 @@ func (ds *DataStore) AssignPodIPv6Address(ipamKey IPAMKey, ipamMetadata IPAMMeta | |||
|
|||
// In IPv6 Prefix Delegation mode, eniPool will only have Primary ENI. | |||
for _, eni := range ds.eniPool { | |||
// Skip ENIs that are excluded from pod IP allocation | |||
if eni.IsExcludedForPodIPs { | |||
ds.log.Debugf("Skipping ENI %s as it is excluded from pod IP allocation", eni.ID) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unnecessary logging
pkg/awsutils/awsutils.go
Outdated
awsAPIErrInc("IMDSMetaDataOutOfSync", err) | ||
cache.applySecurityGroupsToENIs(eniIDs, primarySGs, "Applying primary security groups to") | ||
|
||
return nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If not returning errors, the func doesn't need error in return type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
newENIs := StringSet{} | ||
newENIs.Set(eniIDs) | ||
filteredENIs := newENIs.Difference(&cache.unmanagedENIs) | ||
eniIDs = filteredENIs.SortedList() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I saw ENI IDs and SGs being sorted everywhere and not sure why they need to be sorted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently the only two functions to convert a StringSet
to a []string
is either calling StringSet.Set()
or StringSet.SortedList()
.
We have used SortedList()
everywhere we are using StringSet
even where sorting is not required so I just decided to stick with that convention.
pkg/awsutils/awsutils.go
Outdated
log.Warnf("Failed to check if primary subnet is excluded: %v. Proceeding with ENI creation attempt.", checkErr) | ||
} else if excluded { | ||
// Primary subnet is explicitly excluded | ||
err = errors.New("primary subnet is tagged with kubernetes.io/role/cni=0 - no valid subnets available for ENI creation") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return fmt.Errorf("")
does the same with one line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
@@ -992,33 +1173,79 @@ func (cache *EC2InstanceMetadataCache) createENI(sg []*string, eniCfgSubnet stri | |||
return networkInterfaceID, nil | |||
} | |||
} else { | |||
if cache.useSubnetDiscovery && !cache.v6Enabled { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should v6 check be removed at here? If new logic need check v6, should v6 check be maintained somewhere to keep existing flow?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we would need to remove the v6 check here to make it work with secondary ENIs/subnets.
Right now it would just create the secondary ENI using the same settings as the primary ENI:
amazon-vpc-cni-k8s/pkg/awsutils/awsutils.go
Lines 1020 to 1026 in 312b1d2
} else { | |
log.Info("Using same security group config as the primary interface for the new ENI") | |
networkInterfaceID, err = cache.tryCreateNetworkInterface(input) | |
if err == nil { | |
return networkInterfaceID, nil | |
} | |
} |
pkg/awsutils/awsutils.go
Outdated
// IsPrimarySubnetExcluded implements the APIs interface to check if primary subnet is excluded | ||
func (cache *EC2InstanceMetadataCache) IsPrimarySubnetExcluded() (bool, error) { | ||
// Always check tags - explicit user intent should be respected regardless of subnet discovery setting | ||
return cache.isPrimarySubnetExcluded() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure why we need two func which are doing same thing. If no extra logic, can't just use IsPrimarySubnetExcluded
with isPrimarySubnetExcluded
's code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Combined these two functions.
Was trying to encapsulate the logic and have one function called internal only, and the other one externally, but since there is no difference in logic makes sense to combine them to keep things simple.
fea4982
to
e0c2575
Compare
e0c2575
to
b649f35
Compare
7432ce6
to
a0f7951
Compare
What type of PR is this?
feature
Which issue does this PR fix?:
#3067
#2904
What does this PR do / Why do we need it?:
This PR adds a requested customer feature to allow customers to
kubernetes.io/role/cni=1
orkubernetes.io/role/cni=0
respectively.kubernetes.io/role/cni=1
tagged subnet if the alternate security group is also taggedkubernetes.io/role/cni=1
.kubernetes.io/cluster/<my-example-cluster>
tag. See Enhanced subnet discovery should use configurable tags #2904 more.Testing done on this change:
Will this PR introduce any new dependencies?:
Will this break upgrades or downgrades? Has updating a running cluster been tested?:
Does this change require updates to the CNI daemonset config files to work?:
ENABLE_SUBNET_DISCOVERY
enabled which is already done by default.Does this PR introduce any user-facing change?:
Users were already tagging their subnets as the part of the original subnet discovery feature. This enhanced feature will also allow users to tag alternate security groups.
Make sure that IPv6 documentation and managed IPv4 policy are updated.
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
Integration tests passing in IPv4 cluster.
Currently adapting our test framework to work with IPv6 cluster. Will paste results here.