Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VPNKit DNS server returns NXDOMAIN for SRV records #509

Open
AkihiroSuda opened this issue Aug 31, 2020 · 3 comments · May be fixed by #645
Open

VPNKit DNS server returns NXDOMAIN for SRV records #509

AkihiroSuda opened this issue Aug 31, 2020 · 3 comments · May be fixed by #645

Comments

@AkihiroSuda
Copy link
Member

VPNKit DNS server returns NXDOMAIN for SRV records

$ rootlesskit --net=vpnkit dig -t srv _imaps._tcp.gmail.com
WARN[0000] specifying --disable-host-loopback is highly recommended to prohibit connecting to 127.0.0.1:* on the host namespace (requires slirp4netns or VPNKit) 
WARN[0000] Mounting /etc/resolv.conf without copying-up /etc. Note that /etc/resolv.conf in the namespace will be unmounted when it is recreated on the host. Unless /etc/resolv.conf is statically configured, copying-up /etc is highly recommended. Please refer to RootlessKit documentation for further information. 

; <<>> DiG 9.16.1-Ubuntu <<>> -t srv _imaps._tcp.gmail.com
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 870
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0

;; QUESTION SECTION:
;_imaps._tcp.gmail.com.         IN      SRV

;; Query time: 0 msec
;; SERVER: 192.168.65.1#53(192.168.65.1)
;; WHEN: Mon Aug 31 20:02:57 JST 2020
;; MSG SIZE  rcvd: 39

OTOH slirp4netns DNS works as expected:

$ ./rootlesskit --net=slirp4netns dig -t srv _imaps._tcp.gmail.com
WARN[0000] specifying --disable-host-loopback is highly recommended to prohibit connecting to 127.0.0.1:* on the host namespace (requires slirp4netns or VPNKit) 
WARN[0000] Mounting /etc/resolv.conf without copying-up /etc. Note that /etc/resolv.conf in the namespace will be unmounted when it is recreated on the host. Unless /etc/resolv.conf is statically configured, copying-up /etc is highly recommended. Please refer to RootlessKit documentation for further information. 

; <<>> DiG 9.16.1-Ubuntu <<>> -t srv _imaps._tcp.gmail.com
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 34903
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 65494
;; QUESTION SECTION:
;_imaps._tcp.gmail.com.         IN      SRV

;; ANSWER SECTION:
_imaps._tcp.gmail.com.  5       IN      SRV     5 0 993 imap.gmail.com.

;; Query time: 15 msec
;; SERVER: 10.0.2.3#53(10.0.2.3)
;; WHEN: Mon Aug 31 20:04:05 JST 2020
;; MSG SIZE  rcvd: 84

VPNKit version: v0.4.0
RootlessKit version: v0.10.0

Originally reported by @hawicz in moby/libnetwork#2574

@joanbm
Copy link

joanbm commented Jul 2, 2023

I am running into the same problem, but with AAAA records instead. And in particular, it seems to break DNS resolution for IPv4-hosts for Alpine Linux containers started inside a docker:dind-rootless container, since it seems musl considers that getting a NXDOMAIN response for an AAAA query means that the entire query should fail even though the A query returns a valid list of IP addresses.

I was going to open a new issue and found this one at the last minute, so here's my writeup/analysis for it:

Steps to reproduce

The easiest way I have found to reproduce the issue is as follows:

  1. Install rootful Docker using the official distribution and instructions, for example over Ubuntu Server 22.04

  2. Run the docker:dind-rootless image:

    $ sudo docker run -d --name dind --privileged --env DOCKER_TLS_CERTDIR="" docker:24.0.2-dind-rootless --tls=false
  3. Launch an Alpine Linux container inside it, and try to resolve an IPv4-only domain:

    $ sudo docker exec -it dind env DOCKER_HOST=tcp://localhost:2375 docker run --rm -it alpine:3.18 wget http://ipv4.tlund.se -O /dev/null
  • Actual result: The command outputs wget: bad address 'ipv4.tlund.se', which indicates a DNS resolution failure. This also reproduces when using any other tools, such as e.g. curl.
  • Expected result: The command successfully resolves the address and downloads the website.

Further tests

Some further tests tell us more about the nature of the problem, and why I believe it's related to VPNKit:

  • Alpine Linux is necessary: The problem does not happen if you run the command inside a non-Alpine userspace such as Debian:

    $ sudo docker exec -it dind env DOCKER_HOST=tcp://localhost:2375 docker run --rm -it debian:bullseye-slim sh -c 'apt-get update && apt-get install -y wget && wget http://ipv4.tlund.se -O /dev/null'
    [...]
    Connecting to ipv4.tlund.se (ipv4.tlund.se)|193.15.228.195|:80... connected.
    [...]
  • Docker-in-Docker not necessary: It is not necessary to get into a full Docker-in-Docker scenario to run into the issue. Attempting to resolve the hostname with just rootlesskit as follows, also fails:

    $ sudo docker exec -it dind rootlesskit --net=vpnkit wget https://ipv4.tlund.se -O /dev/null
    [...]
    wget: bad address 'ipv4.tlund.se'
    [...]

    Similarly, the cause of the problem is not that the test is running inside a docker:dind-rootless container. I have set up an Alpine Linux VM, installed rootlesskit and vpnkit on it from this package), and the problem still reproduces.

  • VPNKit necessary:

    Removing the --net=vpnkit switch from the previous command makes it work:

    $ sudo docker exec -it dind rootlesskit wget https://ipv4.tlund.se -O /dev/null
    Connecting to ipv4.tlund.se (193.15.228.195:443)
    [...]

    Similarly, using slirp4netns makes it work:

    $ sudo docker exec -it -u 0 dind apk add slirp4netns
    [...]
    $ sudo docker exec -it dind rootlesskit --net=slirp4netns wget https://ipv4.tlund.se -O /dev/null
    Connecting to ipv4.tlund.se (193.15.228.195:443)
    [...]
  • Fixed by using the --dns=/etc/resolv.conf:

    Adding the --dns=/etc/resolv.conf parameter to VPNKit to force "Upstream" instead of "Host" resolution fixes the problem:

    $ cat >vpnkit_forward.sh <<EOF
    #!/bin/sh
    exec vpnkit --dns=/etc/resolv.conf "\$@"
    EOF
    $ sudo docker cp vpnkit_forward.sh dind:/vpnkit_forward.sh
    $ sudo docker exec -it -u 0 dind chmod +x /vpnkit_forward.sh
    $ sudo docker exec -it dind rootlesskit --net=vpnkit --vpnkit-binary=/vpnkit_forward.sh wget https://ipv4.tlund.se -O /dev/null
    Connecting to ipv4.tlund.se (193.15.228.195:443)
    [...]
  • Related to IPv4 hosts

    The problem appears to be related to the fact that the host we are trying to resolve (in the example: ipv4.tlund.se) only has A (IPv4) records, but no AAAA records. Trying to resolve a host with both kinds of records does work:

    $ sudo docker exec -it dind rootlesskit --net=vpnkit wget https://dual.tlund.se -O /dev/null
    Connecting to dual.tlund.se (193.15.228.195:443)
    [...]

Potential cause: NXDOMAIN for AAAA records

I believe that the problem is that when you run an AAAA query for a domain without any AAAA records inside rootlesskit+vpnkit, you get an invalid NXDOMAIN response:

$ sudo docker exec -it -u 0 dind apk add bind-tools
$ sudo docker exec -it dind rootlesskit --net=vpnkit dig ipv4.tlund.se AAAA | grep status
;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 60429

While if you run it without vpnkit or with slirp4netns, you get a NOERROR response:

$ sudo docker exec -it dind dig ipv4.tlund.se AAAA | grep status
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 54463
$ sudo docker exec -it dind rootlesskit --net=slirp4netns dig ipv4.tlund.se AAAA | grep status
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 41316

It appears that the musl DNS resolver will fail the resolution once it sees that NXDOMAIN response for the AAAA records, failing the entire resolution.

I have not yet had time to figure out why we're getting a NXDOMAIN response after we add VPNKit (or what the specs say about those weird cases), but at first glance it seems like it should return NOERROR instead.

@joanbm
Copy link

joanbm commented Jul 2, 2023

At least for AAAA queries, the NXDOMAIN appears to come from those two lines:

| [] ->
Lwt.return (Ok (Some (marshal nxdomain)))
, which seem to be turning a response with no answers into a NXDOMAIN.

@dan0dbfe
Copy link

Hello from the future!

I've arrived at the same conclusion after running into the same issue moby/moby#47628.

But I don't know OCaml to help fix it. The code should distinguish between an empty NOERROR vs a NXDOMAIN returned from upstream.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants