Date: Wed, 12 Oct 2022 19:56:29 -0600 From: Bob Proulx <bob@proulx.com> To: freebsd-questions@freebsd.org Subject: Re: resolv.conf question Message-ID: <20221012185254621820516@bob.proulx.com> In-Reply-To: <alpine.BSF.2.00.2210111630040.66282@bucksport.safeport.com> References: <alpine.BSF.2.00.2210111300120.66282@bucksport.safeport.com> <CAFbbPug83%2BxyjZoR%2BOZ1HqnzDCptmqLFbZ7vThgP9=O6QjF-KA@mail.gmail.com> <alpine.BSF.2.00.2210111630040.66282@bucksport.safeport.com>
next in thread | previous in thread | raw e-mail | index | archive | help
Doug Denault wrote: > > Doug Denault wrote: > > So I tried to RTFM, /usr/src/contrib/ldns/resolver.c in this case. It is > > almost certain that the system was up but bind did not respond. The source > > is a bit above my pay grade but it did seem possible that if that was the > > case, the second server was never tried. This is what actually happened. > > > > There were no other issues as each of the jails started fine with a manual > > boot. Does anyone know if the timeout and/or retry setting offer a way > > around this. > > For performance reasons, especially if the first listed server is always > used, I want that in our data center. Aside from speed, no hacking is > possible. My purpose here is to figure how resolv.conf works. If more than > one entry is effectively useless, I would be tempted to use 8.8.8.8. Also > the jail mother had not been booted in several months and only now because I > f-ed up changing the root password. I still have a physical copy of DNS and BIND by Paul Albitz & Cricket Liu published by O'Reilly 1992. I have no idea if the way this was described there still matches the way it is resolved now. But I think it likely it is still at least similar. It is described that the timeouts will depend upon the number of nameserver directives in the resolv.conf file. Here is a table that I reproduce here. | Name Servers Configured ------+---------------------------- Retry | 1 | 2 | 3 ------+---------+---------+-------- 0 | 5s | (2x)5s | (3x)5s 1 | 10s | (2x)5s | (3x)3s 2 | 20s | (2x)10s | (3x)6s 3 | 40s | (2x)20s | (3x)13s ------+---------+---------+-------- Total | 75s | 80s | 81s If there is no nameserver configured then the default is to query the nameserver on the local system. None is the same as one configured local host nameserver. If there is one nameserver configured then it will query that nameserver with a timeout of 5 seconds. This is the timeout before sending another query. A retry. If the resolver encounters and error that indicates the nameserver is really down or unreachable or times out it will double the timeout and query the nameserver again. If there is more than one nameserver configured then the libc resolver queries the first one in the list with a timeout of 5 seconds. If that query times out or recieves an error then it falls back to the next nameserver in the list with the same 5 second timeout. If the resolver reaches the end of the list and all of them (up to three) timed out or received an error then it will update the timeouts and cycle through the list again. The next retry through the list will have timeouts set according to a calculation of 10 seconds divided by the number of nameservers configured rounded down. One nameserver is 10 seconds. Two nameservers is 5 seconds. Three nameservers is 3 seconds. If that round of queries through each of the nameservers again receives errors or timeouts then the timeout values are doubled and the queries retry again. There are four possible rounds of queries. The first initial round with the 5s timeouts. The second round with the calculated timeouts. The 3rd and 4th rounds with the calculated timeouts doubled each round. That accounts for why the total time it takes a DNS lookup using the libc resolver will vary among 75s, 80s, 81s depending upon the number of nameserver directives configured in the case that all of them return either errors or are unreachable. Again let me repeat that this was as descibed in 1992 and I have no idea if the current implementation is still the same. But at least it lays the foundation for the way things used to work. To get come recent data I tried it on my NetBSD 9.0 system here. (I know I am behind and need to upgrade it to the current 9.3.) I tried the four combinations with unreachable (non-existent) nameservers. No nameservers configured. No local host nameserver running. netbsd# time host example.com ;; connection timed out; no servers could be reached 12.17s real 0.02s user 0.02s system One unreachable nameserver configured. netbsd# time host example.com ;; connection timed out; no servers could be reached 10.05s real 0.02s user 0.00s system Two unreachable nameservers configured. netbsd# time host example.com ;; connection timed out; no servers could be reached 12.07s real 0.01s user 0.02s system Three unreachable nameservers configured. netbsd# time host example.com ;; connection timed out; no servers could be reached 14.10s real 0.03s user 0.01s system Then I configured two nameserver where the first one was unreachable but the second one was local, available, and online. netbsd# time host example.com example.com has address 93.184.216.34 example.com has IPv6 address 2606:2800:220:1:248:1893:25c8:1946 example.com mail is handled by 0 . 3.41s real 0.02s user 0.01s system Then again with three nameservers but with the first two being unreachable and again the third one, the last one, being available. netbsd# time host example.com example.com has address 93.184.216.34 example.com has IPv6 address 2606:2800:220:1:248:1893:25c8:1946 example.com mail is handled by 0 . 6.09s real 0.01s user 0.02s system Therefore it looks like the algorithm implemented now is similar but somewhat different than that as historically described. ================================================================ Let's see the same experiment again with FreeBSD 12.3. No nameservers configured. No local host nameserver running. [root@freebsd ~]# time host example.com ;; connection timed out; no servers could be reached real 0m20.219s user 0m0.002s sys 0m0.003s One unreachable nameserver configured. [root@freebsd ~]# time host example.com ;; connection timed out; no servers could be reached real 0m10.111s user 0m0.000s sys 0m0.006s Two unreachable nameservers configured. [root@freebsd ~]# time host example.com ;; connection timed out; no servers could be reached real 0m20.226s user 0m0.005s sys 0m0.000s Three unreachable nameservers configured. [root@freebsd ~]# time host example.com ;; connection timed out; no servers could be reached real 0m30.409s user 0m0.000s sys 0m0.007s Then I configured two nameserver where the first one was unreachable but the second one was local, available, and online. [root@freebsd ~]# time host example.com example.com has address 93.184.216.34 example.com has IPv6 address 2606:2800:220:1:248:1893:25c8:1946 example.com mail is handled by 0 . real 0m10.091s user 0m0.000s sys 0m0.007s Then again with three nameservers but with the first two being unreachable and again the third one, the last one, being available. [root@freebsd ~]# time host example.com example.com has address 93.184.216.34 example.com has IPv6 address 2606:2800:220:1:248:1893:25c8:1946 example.com mail is handled by 0 . real 0m20.309s user 0m0.002s sys 0m0.004s ================================================================ Let's see the same experiment again with Debian Unstable with glibc version 2.35. No nameservers configured. No local host nameserver running. root@glibc:~# time host example.com ;; communications error to ::1#53: connection refused ;; communications error to ::1#53: connection refused ;; communications error to 127.0.0.1#53: connection refused ;; no servers could be reached real 0m0.031s user 0m0.015s sys 0m0.005s Interesting that it complains about both IPv6 failure and IPv4 failure whereas traditionally it is silent. ("::1" being IPv6 localhost, and 127.0.0.1 being IPv4 localhost.) One unreachable IPv4 local host nameserver configured. root@glibc:~# time host example.com ;; communications error to 127.0.0.1#53: connection refused ;; communications error to 127.0.0.1#53: connection refused ;; no servers could be reached real 0m0.034s user 0m0.019s sys 0m0.000s One unreachable IPv4 nameserver configured. This doesn't show timestamps but each line was output at 5s intervals. root@glibc:~# time host example.com ;; communications error to 192.168.1.151#53: timed out ;; communications error to 192.168.1.151#53: timed out ;; no servers could be reached real 0m10.045s user 0m0.016s sys 0m0.008s Two unreachable nameservers configured. This doesn't show timestamps but each line was output at 5s intervals. root@glibc:~# time host example.com ;; communications error to 192.168.1.151#53: timed out ;; communications error to 192.168.1.151#53: timed out ;; communications error to 192.168.1.152#53: timed out ;; no servers could be reached real 0m15.049s user 0m0.014s sys 0m0.009s Three unreachable nameservers configured. root@glibc:~# time host example.com ;; communications error to 192.168.1.151#53: timed out ;; communications error to 192.168.1.151#53: timed out ;; communications error to 192.168.1.152#53: timed out ;; communications error to 192.168.1.153#53: timed out ;; no servers could be reached real 0m20.052s user 0m0.012s sys 0m0.008s ================================================================ I am not sure if this in any way answers your questions. But hopefully it provides some interesting information about the behavior of the resolver in these various different systems. Personally I almost always configure a local caching nameserver on the local host for my server systems. For me that is almost always the right answer for Internet connected servers. However for DHCP mobile clients I mostly don't and use the DHCP provided nameservers. That's the best answer to allow spoofing for captive portal open WiFi Access Points such as at namebrand coffee shops and airports. One more "however" here as not validating DNSSEC also allows spoofing. Therefore I turn my mobile laptop's local DNSSEC validating nameserver on and off manually. I need it on for security. I need it off for clicking through the EULA on a captive portal. Captive portals are rather a mess. https://en.wikipedia.org/wiki/Captive_portal Bob
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20221012185254621820516>