Skip to content

fix(discovery): handle k8s IPv6 EndpointSlices #965

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

andrewazores
Copy link
Member

@andrewazores andrewazores commented Jul 2, 2025

Welcome to Cryostat! 👋

Before contributing, make sure you have:

  • Read the contributing guidelines
  • Linked a relevant issue which this PR resolves
  • Linked any other relevant issues, PR's, or documentation, if any
  • Resolved all conflicts, if any
  • Rebased your branch PR on top of the latest upstream main branch
  • Attached at least one of the following labels to the PR: [chore, ci, docs, feat, fix, test]
  • Signed all commits using a GPG signature

To recreate commits with GPG signature git fetch upstream && git rebase --force --gpg-sign upstream/main


Fixes: #964

Description of the change:

  1. Adds a boolean config property (defaults to true) to apply a DNS name transformation to EndpointSlice addresses which are IPv4. Rather than using the simple direct IPv4 1.2.3.4 address as the hostname, the transformation changes this to the k8s DNS 1-2-3-4.$namespace.pod hostname. This seems to be useful for at least some network stacks, ex. in kind.
  2. Adds handling for IPv6 addresses, where the URL containing an IPv6 address as the hostname must enclose the address in square brackets. This doesn't actually fix IPv6 connectivity issues at the moment - connection failures further down in JMC code or even (apparently) in JDK RMI code occur. For that reason, this also:
  3. Adds a boolean config property to disable IPv6 handling in k8s discovery. EndpointSlices that advertise IPv6 addresses will simply be skipped if this is enabled. This would apply to dualstack clusters, where individual Pods/Services might have either or both IPv4/IPv6 addresses. This way, the IPv6 can be ignored and IPv4 used instead.

Motivation for the change:

This change is helpful because users may want to...

How to manually test:

kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
networking:
  ipFamily: ipv6
  1. Run CRYOSTAT_IMAGE=quay.io... bash smoketest.bash...
  2. ...

@andrewazores
Copy link
Member Author

/build_test

Copy link

github-actions bot commented Jul 2, 2025

Workflow started at 7/2/2025, 3:37:50 PM. View Actions Run.

Copy link

github-actions bot commented Jul 2, 2025

No GraphQL schema changes detected.

Copy link

github-actions bot commented Jul 2, 2025

No OpenAPI schema changes detected.

Copy link

github-actions bot commented Jul 2, 2025

CI build and push: All tests pass ✅
https://github.com/cryostatio/cryostat/actions/runs/16034198900

@andrewazores andrewazores marked this pull request as ready for review July 2, 2025 21:00
@andrewazores andrewazores requested review from tthvo and a team July 2, 2025 21:01
@andrewazores
Copy link
Member Author

image

Copy link
Member

@tthvo tthvo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! I verified with single-stack IPv6 and single-stack IPv4 and target pods are registered as expected!

@tthvo
Copy link
Member

tthvo commented Jul 2, 2025

Single-stack IPv4 k8s cluster (with kinD):

image

@andrewazores
Copy link
Member Author

Just out of curiosity, are you able to test on dual stack?

@tthvo
Copy link
Member

tthvo commented Jul 3, 2025

Just out of curiosity, are you able to test on dual stack?

Unfortunately, I don't have a dual-stack cluster with me to test... I am trying with kinD again today (single-stack IPv4, single-stack IPv6 and dual-stack) but issues persist still. Below is the regular IPv4 kind.

org.openjdk.jmc.rjmx.common.ConnectionException caused by java.io.IOException: Failed to retrieve RMIServer stub: javax.naming.ServiceUnavailableException [Root exception is java.rmi.ConnectException: Connection refused to host: 10-244-0-8.default.pod; nested exception is: 
	java.net.ConnectException: Connection refused]
	at org.openjdk.jmc.rjmx.common.internal.RJMXConnection.connect(RJMXConnection.java:364)
	at io.cryostat.core.net.JFRJMXConnection.attemptConnect(JFRJMXConnection.java:409)
	at io.cryostat.core.net.JFRJMXConnection.connect(JFRJMXConnection.java:364)
	at io.cryostat.core.net.JFRJMXConnection.getJvmIdentifier(JFRJMXConnection.java:189)
	at io.cryostat.targets.TargetConnectionManager.lambda$executeConnectedTaskUni$1(TargetConnectionManager.java:212)
	at io.smallrye.mutiny.unchecked.UncheckedFunction.lambda$toFunction$0(UncheckedFunction.java:45)
	at io.smallrye.context.impl.wrappers.SlowContextualFunction.apply(SlowContextualFunction.java:21)
	at io.smallrye.mutiny.operators.uni.UniOnItemTransform$UniOnItemTransformProcessor.onItem(UniOnItemTransform.java:36)
	at io.smallrye.mutiny.operators.uni.builders.UniCreateFromCompletionStage$CompletionStageUniSubscription.forwardResult(UniCreateFromCompletionStage.java:63)
	at java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:863)

@tthvo
Copy link
Member

tthvo commented Jul 7, 2025

Unfortunately, I don't have a dual-stack cluster with me to test... I am trying with kinD again today (single-stack IPv4, single-stack IPv6 and dual-stack) but issues persist still. Below is the regular IPv4 kind.

Seems like a flaky networking in kinD locally. The issue seems fixed when testing on cloud providers...

@andrewazores
Copy link
Member Author

/build_test

Copy link

github-actions bot commented Jul 8, 2025

Workflow started at 7/8/2025, 10:58:59 AM. View Actions Run.

Copy link

github-actions bot commented Jul 8, 2025

No GraphQL schema changes detected.

Copy link

github-actions bot commented Jul 8, 2025

No OpenAPI schema changes detected.

Copy link

github-actions bot commented Jul 8, 2025

CI build and push: All tests pass ✅
https://github.com/cryostatio/cryostat/actions/runs/16146850574

@tthvo
Copy link
Member

tthvo commented Jul 22, 2025

Just out of curiosity, are you able to test on dual stack?

Updated: Worked fine on dualstack k8s cluster (via kubeadm). See more details at #71 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug] Cryostat failed to connect to JVM targets over IPv6
2 participants