-
Notifications
You must be signed in to change notification settings - Fork 132
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DNS response time in delayshell not deterministic #105
Comments
Hmm, I'm having trouble replicating this. I'm also more troubled by the "status: REFUSED" on your third query than the fact that it took 200 milliseconds. I wonder why dnsmasq is refusing the query. (We are running dnsmasq with Can you supply a tarball of the replay directory that produces this problem? |
@keithw , ok sorry, more background, and new information, as i do understand the problem better now. I have created a mahimahi HTTP/1 capture from a HTTP/1 webserver running in a VM on my host. Before replaying i remove h2pushsrv from /etc/hosts and restart networking so that mahimahi's dnsmasq serves it. Now in replay mode with 50ms delay using chrome, when I enable query logging in dnsmask (
Since i am connected to eduroam.rwth-aachen.de via wi-fi chrome also issues dns request for Now I noticed that sometimes the DNS-Lookup part of the replay in chrome is sometimes off by 1 RTT. Note that there is no capture file for h2pushsrv.eduroam.rwth-aachen.de since there wasn't actually any req/response served over this domainname. When i dig h2pushsrv.eduroam.rwth-aachen.de i get REFUSED, however with 100ms RTT.
This is correct. Now when i do
Suddenly, query time is 201msec for the last req, which is 2xRTT. Here is the recording: I guess the entire problem can be avoided by requesting |
I've looked at the tcpdump output, and it looks to me like I haven't dived into the Are you seeing this on a more normal |
Yes. In fact, the way i spotted this was using chrome dev tools. Because i am doing Page Load Time Measurements, i noticed the DNS timings we're off in some runs in chrome. See Screenshots here (first-screenshot: start of a fresh chrome in mm-webreplay with delay 50, then i quitted chrome inside the replayshell and started it fresh again. the second time dns lookup is 300ms) Note to test this with other browsers you may have to disable ua checking in replayserver |
Okay, can you get a tcpdump capture of the DNS requests? My suspicion is that Chrome (or the stub resolver) is making two requests, just like dig was doing. My random guess at the moment is that it's possible that the root cause here is that dnsmasq is returning REFUSED instead of NXDOMAIN for nonexistent names (causing the resolvers to have some reason to want to retry the request). But first let's try to figure out if the reason for the extra time is that it's indeed making a second request, just as with dig. |
Thank you. I think the problem is that i am still connected to our university network, where i am part of a domain. See Screenshots here: https://imgur.com/a/8VVPo First case: Second Case: Note that there is alot of garbage due to the initial start of chrome. |
Okay, and just to confirm, does adding |
@keithw From my following experiment I am connected to my WiFi at home now so now instead of h2pushsrv.comsys.rwth-aachen.de it will query h2pushsrv.local if i am connected to wifi it stops doing that when i am disconnected. See the following set of screenshots: https://imgur.com/a/hiKDZ The issue seems so simply be that whenever the machine i run mahimahi on is connected to an outside network, there are additional DNS queries issued to $query_hostname.$domainname which cannot be answered by dnsmasq and those extra requests will incur the extra RTTs. A more minimalist example would be:
|
The problem seems to be that ubuntu adds a Maybe we could generate a resolf.conf specifically only for the replay namespace, but this requires a named network namespace i think. Instead of This might also eliminate the need to start multiple dnsmasq instances if there are multiple dns servers configured in the user's system. |
That's an interesting idea, but to the best of my knowledge, per-network-namespace I think this may have to be a "performing as designed" as far as mahimahi is concerned. Your userspace programs are doing multiple DNS lookups, apparently because that's what's in your (userspace) resolver configuration. What you're seeing is the effects of that behavior. |
I cannot confirm that. I've fiddled around with this yesterday and got a working mahimahi prototype where a named network namespace is created with corresponding /etc/netns/$namespacename/resol.conf. (Adopted from the iproute2 source for
I am not sure about the design philosophy of mahimahi, however it is confusing that the timing of a local replay of a website is dependent of the outside dns configuration (i.e. whether i am part of a domain or not). See this log of a replay i did from the host
Obviously, sometimes the time spend resolving DNS is 1 RTT higher, which yields to different page load times inside the same mm-webreplay session. Since the .local depends on the current local network configuration (.local is just my home network, when i am at work its another domainname), eliminating the .local by introducing a mm-webreplay-specific resolv.conf seems like a clean way to solve this problem. |
(commented on #106) |
Noticed a discrepancy in dns timings dns timings were off sometimes a rtt. Not entirely sure why this is, checked chrome dns queries and even in incognito chrome sometimes sends a deterministic bunch of dns queries too chromium-i18n.appspot.com and google.com, either by chromedrivier issuing the request due to url entering (prefetching google search results) or some google stuff to protect one from malware etc. anyhow, timings should not never differ across runs. i think its some request pipelining issue in replayshell passing --all-servers to dnsmasq seems to resolve this, allthough i have no idea why, which is kinda dangerous. Observe Ticket: ravinet/mahimahi#105
When replaying pages i noticed that dns timings are sometimes indeterministically off by 1 RTT. I assume that is some kind of pipelinine stall somewhere.
Example:
Notice that the 3rd query time is twice the rtt.
I found that passing "--all-servers" to dnsmasq in dns_server.cc will avoid the issue (at least it has not happened in a testrun with 90 tries). However, i do not really understand why.
The text was updated successfully, but these errors were encountered: