-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Open
Labels
pendingLabel for issues waiting a Datadog member's response.Label for issues waiting a Datadog member's response.team/triage
Description
Problem Description
The Datadog Agent with network monitoring enabled prevents proper cleanup of network interfaces (veth pairs) when pods are deleted in AWS EKS clusters, leading to massive interface accumulation and intermittent IPv4 connectivity failures for pods.
Environment Details
- Platform: Amazon EKS
- Kubernetes Version: 1.33
- Node OS: Bottlerocket OS 1.42.0 (aws-k8s-1.33)
- Cluster IP address family: IPv6
- Instance Type: c7i.4xlarge
- AWS VPC CNI Version: v1.19.6-eksbuild.1
- Datadog Agent Version: 7.70.2
- Datadog Agent Configuration: Network monitoring enabled
Technical Symptoms Discovered
Expected Behavior:
- Node with 35 active pods should have ~35-70 veth interfaces (one pair per pod)
- When pods are deleted, AWS VPC CNI should clean up corresponding veth interfaces
Actual Behavior:
- Node with 35 active pods accumulated 2000+ veth interfaces
- Stale veth interfaces persist after pod deletion
Connectivity Impact:
- Intermittent IPv4 connectivity failures for new pods
- Pods cannot reach external services (DNS resolution works, but TCP connections fail)
- IPv6 connectivity remains unaffected
# Healthy node (without issue):
$ ip addr show | grep ": veth" | wc -l
2 # Expected: matches number of active pods
# Affected node (with Datadog network monitoring):
$ ip addr show | grep ": veth" | wc -l
2000+ # Problem: massive interface accumulation
- Without Datadog network monitoring: CNI cleanup works properly
- With Datadog network monitoring enabled: veth interfaces accumulate indefinitely
- Hypothesis: Network monitoring hooks prevent AWS VPC CNI from properly cleaning up network interfaces during pod deletion
Steps to Reproduce
- Deploy AWS EKS cluster with Bottlerocket nodes
- Install Datadog Agent with network monitoring enabled
- Deploy and delete pods repeatedly over several days
- Monitor interface count:
ip addr show | grep ": veth" | wc -l
- Observe interface count growing and not decreasing when pods are deleted
Disabling Datadog network monitoring immediately resolved the issue: the unused network interfaces were deleted after the agent was terminated.
Please let me know what additional diagnostic information would be helpful for investigating this issue.
iuliancmarcu-creatopy
Metadata
Metadata
Assignees
Labels
pendingLabel for issues waiting a Datadog member's response.Label for issues waiting a Datadog member's response.team/triage