Skip to content

Conversation

@aymericDD
Copy link
Contributor

@aymericDD aymericDD commented Dec 18, 2025

What does this PR do?

  • Adds new functionality
  • Alters existing functionality
  • Fixes a bug
  • Improves documentation or testing

Please briefly describe your changes as well as the motivation behind them:

Changes:

  • Adds per-host DNS resolution strategy control through a new dnsResolver field on network disruption hosts
  • Implements four DNS resolver strategies: pod, node, pod-fallback-node (default), and node-fallback-pod
  • Refactors DNS client to support strategy-based resolution with ResolveWithStrategy method
  • Adds getResolversAndNames helper method for cleaner DNS configuration logic

Motivation:

  • Addresses issues with service mesh proxies (like Istio DNS proxy) that intercept DNS queries and return VIP addresses (240.x.x.x) that don't work with tc traffic control rules
  • Provides flexibility for users to choose between pod and node nameservers based on their cluster configuration
  • Maintains backward compatibility with default pod-fallback-node strategy

Resolves: #882

Code Quality Checklist

  • The documentation is up to date.
  • My code is sufficiently commented and passes continuous integration checks.
  • I have signed my commit (see Contributing Docs).

Testing

  • I leveraged continuous integration testing
    • by depending on existing unit tests or end-to-end tests.
    • by adding new unit tests or end-to-end tests.
  • I manually tested the following steps:
    • Test default behavior without specifying dnsResolver field (uses pod-fallback-node)
    • Test dnsResolver: pod strategy to verify pod-only DNS resolution
    • Test dnsResolver: node strategy to verify node-only DNS resolution
    • Test dnsResolver: pod-fallback-node with pod DNS failure to verify fallback
    • Test dnsResolver: node-fallback-pod with node DNS failure to verify fallback
    • Test with Istio-enabled namespace using dnsResolver: node to bypass DNS proxy
    • locally.
    • as a canary deployment to a cluster.

@datadog-datadog-prod-us1
Copy link

datadog-datadog-prod-us1 bot commented Dec 18, 2025

✅ Tests

🎉 All green!

❄️ No new flaky tests detected
🧪 All tests passed

This comment will be updated automatically if new data arrives.
🔗 Commit SHA: 7b43367 | Docs | Datadog PR Page | Was this helpful? Give us feedback!

@aymericDD aymericDD force-pushed the aymeric.daurelle/CHAOSPLT-1359/feature branch from faf7d30 to 91df8fa Compare December 18, 2025 11:31
@aymericDD aymericDD changed the title feat(network-disruption): add DNS resolver control [CHAOSPLT-1359] Add DNS resolver control Dec 19, 2025
@aymericDD aymericDD requested a review from a team December 22, 2025 10:21
@aymericDD aymericDD marked this pull request as ready for review December 22, 2025 10:25
for _, host := range s.Hosts {
args = append(args, "--hosts", fmt.Sprintf("%s;%d;%s;%s;%s", host.Host, host.Port, host.Protocol, host.Flow, host.ConnState))
if host.DNSResolver == "" {
args = append(args, "--hosts", fmt.Sprintf("%s;%d;%s;%s;%s", host.Host, host.Port, host.Protocol, host.Flow, host.ConnState))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not just having host.DNSResolver set to "" even if it's empty? Seems like we also add empty "" if nothing is defined in other fields. I don't think we need any ifs?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another way we could have is to have the default be set before? That way it's never empty? (although I can't remember if it's something we do in the chaos-controller or not, so ignore it if we do not do this usually)


When Istio DNS proxy is enabled, pod-level DNS lookups may return VIP addresses (Class E subnet: 240.0.0.0/4) that don't work for network disruptions. In this case, use `dnsResolver: node` to bypass the Istio proxy and get the actual service IPs:

```yaml
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could we have those examples be defined in the examples/ dir and link them instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really see the added value because the examples are more here to test the disruption effect. Here the resolution of the host will be the same if it's resolved by the node or the pod resolver.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as you want, but we still need to update the complete example to add your new field

Adds per-host DNS resolution strategy control to allow
users to specify whether hostnames should be resolved
using pod or node nameservers.

This addresses issues with service mesh proxies (like
Istio DNS proxy) that intercept DNS queries and return
VIP addresses (240.x.x.x) that don't work with tc
traffic control rules.

Users can now specify dnsResolver field on each host
with strategies: "pod", "node", "pod-fallback-node"
(default), or "node-fallback-pod". The default behavior
maintains backward compatibility by trying pod DNS first
with automatic fallback to node DNS.

Resolves: #882

Jira: CHAOSPLT-1359
@aymericDD aymericDD force-pushed the aymeric.daurelle/CHAOSPLT-1359/feature branch from 91df8fa to 7b43367 Compare December 22, 2025 20:27
@aymericDD aymericDD requested a review from a team December 22, 2025 20:43

### Case 5: Controlling DNS Resolution with `dnsResolver`

When specifying hostnames in the `hosts` or `allowedHosts` fields, you can control which DNS resolver is used to resolve the hostname to an IP address. This is particularly useful in environments with service mesh proxies (like Istio) that intercept DNS queries and return virtual IPs (VIPs) that may not work for traffic disruption.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To make it a little more clear?

Suggested change
When specifying hostnames in the `hosts` or `allowedHosts` fields, you can control which DNS resolver is used to resolve the hostname to an IP address. This is particularly useful in environments with service mesh proxies (like Istio) that intercept DNS queries and return virtual IPs (VIPs) that may not work for traffic disruption.
When specifying hostnames in the `hosts` or `allowedHosts` fields, you can control which DNS resolver is used to resolve the hostname to an IP address. This is particularly useful in environments with service mesh proxies (like Istio) that intercept DNS queries and return virtual IPs (VIPs) that may not work for traffic disruption. In these cases, you should rely on the node-level resolver to ensure the hostnames resolve to the actual destination IPs rather than the service mesh VIP.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants