Skip to content

Conversation

JackDoan
Copy link
Collaborator

@JackDoan JackDoan commented Sep 8, 2025

No description provided.

f.l.WithError(err).WithField("udpAddr", addr).
WithField("handshake", m{"stage": 1, "style": "ix_psk0"}).WithField("cert", remoteCert).
Info("Unable to handshake with host due to missing certificate version")
return
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change makes this flow actually work

them(v1+v2) ----handshake(v1)---> me (v2 only)

a potential drawback is if them is actually an old client with only v1 certs, we may make it upset?

with this current flow, we'll also create a connectionstate with mismatched cert versions. I think this is actually fine? But maybe we should try to trigger a rehandshake when the connectionmanager sees that, so everyone can know everyone's VPN IPs?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think yes. I added that.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm maybe we should only bother rehandshaking if there's a meaningful difference though? (like different number of VPN-IPs)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a potential drawback is if them is actually an old client with only v1 certs, we may make it upset?

I think this is all of the problem, how does the remote behave when encountered with an unknown certificate version. I think anything pre 1.9.3ish is going to have a bad time.

hmm maybe we should only bother rehandshaking if there's a meaningful difference though? (like different number of VPN-IPs)

The only thing we can safely make a determination about is our own local state. Is this tunnel using a certificate at the current configured version level? Is the certificate the latest one available? I would be very wary of trying to reason about the remote state in a re-handshake flow.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only thing we can safely make a determination about is our own local state. Is this tunnel using a certificate at the current configured version level? Is the certificate the latest one available? I would be very wary of trying to reason about the remote state in a re-handshake flow.

Exactly, I agree. What I was referring to here was:

  1. I define two parties, one named Host, with v1+v2 certs, and Peer, with only a v2 cert
  2. Host handshakes to Peer. Host uses a V1 cert, as this is the default, most-compatible behavior
  3. Peer doesn't have a V1 cert. Peer replies to Host with a V2 cert.
  4. Host accepts Peer's V2 cert
  5. Host now has a decision to make:
    a. if Host's certv1.Networks() == certv2.Networks(), (imo), there's no reason to do anything. Just use this tunnel.
    b. else, (imo) it's worth having Host trigger a rehandshake with its V2 cert, so Peer will be informed of all of Host's VPN IPs. This is important on mixed-dual-stack networks for things like relays to behave reliably (and dual stack configurations that lack a dual-stack lighthouse!)
    c. currently this PR always chooses path (b)

All the other rehandshake-flow bits should remain in place as they always have. I think my phrasing was poor: I meant "bother rehandshaking when scenario 5.b applies"

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is all of the problem, how does the remote behave when encountered with an unknown certificate version. I think anything pre 1.9.3ish is going to have a bad time.

We should specifically test this to be sure, but shouldn't it just drop the packet? We'll error unmarshalling the not-proto in stage 1 and delete the handshake-hostinfo.

This class of problem should also only crop up when an un-updated, v1-only host, attempts to contact a v2-only host, I think?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I think it's "initiator only understands v1 and responder only has v2".

The other side of this is "initiator only has v1 but understands v2 and is trying to handshake with an ipv6 overlay address" which isn't allowed at the moment through https://github.com/slackhq/nebula/pull/1468/files#diff-d27b1b01b189fd93eae405de59c38e5bd23ff1c9868a0dabac92703842666b78R31

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did confirm the packet gets dropped with an ugly log line, no bad behavior occurs otherwise

@JackDoan JackDoan marked this pull request as ready for review September 10, 2025 20:14
@JackDoan JackDoan force-pushed the cert-v2-reloads branch 2 times, most recently from 345ceef to eb5954f Compare September 29, 2025 19:03
@JackDoan JackDoan requested a review from nbrownus October 2, 2025 17:33
WithField("fingerprint", remoteCert.Fingerprint).
Info("Remote certificate is no longer valid, tearing down the tunnel")
return true
} else {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit, I'd leave out the else branch and return false at the end of the function to try and avoid future shenanigans

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I kinda see this the other way. If all the return statements are inside a conditional, anything outside the conditions is unreachable. It avoids future shenanigans in the form of "someone adds another condition that doesn't have a return".

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Strong opinion loosely held though, I can swap it if you wanna

@JackDoan JackDoan requested review from nbrownus and wadey October 6, 2025 20:50
if hh.initiatingVersionOverride != cert.VersionPre1 {
v = hh.initiatingVersionOverride
} else if v < cert.Version2 {
// If we're connecting to a v6 address we must use a v2 cert
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the must here is really a should try super hard but maybe its ok

@wadey wadey added this to the v1.10.0 milestone Oct 15, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants