Date: Fri, 19 Sep 2008 12:48:48 +0100 (BST) From: Robert Watson <rwatson@FreeBSD.org> To: "Oleg V. Nauman" <oleg@opentransfer.com> Cc: freebsd-stable@FreeBSD.org Subject: Re: RELENG_7: something is very wrong with UDP? Message-ID: <alpine.BSF.1.10.0809191241050.3922@fledge.watson.org> In-Reply-To: <20080919143636.p661cjfopw44osco@webmail.opentransfer.com> References: <20080918180543.pt7s2zmaio48ww8g@webmail.opentransfer.com> <alpine.BSF.1.10.0809182005570.16464@fledge.watson.org> <20080919143636.p661cjfopw44osco@webmail.opentransfer.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On Fri, 19 Sep 2008, Oleg V. Nauman wrote: >> (1) Start by deleting all but one nameserver entry in /etc/resolv.conf. >> Confirm that you can still reproduce the problem. > > Due to various reasons my laptop running local caching DNS server ( named ) > without any forwarders assigned. My /etc/resolv.conf contains nameserver > 127.0.0.1 This is simplifying in some senses, but complicating in others. In particular, the question it raises is whether the problem is in the DNS resolver or the nameserver. Seeing a tcpdump of lo0 for DNS traffic would be quite interesting, since we could look at timestamps and try to place the blame a bit more precisely. >> Could you >> also use procstat -k on the dig process to generate a kernel stack trace >> for it? Let's add to this list: when the problem happens, could you also procstat -k the name server process(es)? > And procstat -kk output for logger process waiting: > > PID TID COMM TDNAME KSTACK > 1421 100095 logger - mi_switch+0x2c8 > sleepq_switch+0xd9 sleepq_catch_signals+0x239 sleepq_wait_sig+0x14 > _sleep+0x35f pipe_read+0x389 dofileread+0x96 kern_readv+0x58 read+0x4f > syscall+0x2b3 Xint0x80_syscall+0x20 Interesting -- logger is blocked on reading from a pipe, likely standard input. So it sounds like something else is failing to complete in a timely manner -- perhaps due to DNS. >> This is approximately the date of my last UDP MFC. Could you try backing >> out just src/sys/netinet6/udp6_usrreq.c revision 1.81.2.7 and see if that >> helps? (specifically, restore the use of sosend_generic instead of >> sosend_dgram) If you can show that it's definitely a problem with the change to sosend_dgram for UDPv6 socket send, then it might suggest it's the same problem that it is related to the UDPv46 code there. In which case I will propose we back out that portion of the change in the 7-stable branch until it's known to be resolved -- I don't want other people tripping over this. >> Could you try compiling your kernel with WITNESS to see if we get any >> extended debugging information? > > Have added WITNESS ( and STACK required by procstat ) options but it is not > producing any output ( so no LORs or something like this ) OK. Could you try adding INVARIANT_SUPPORT and INVARIANTS if they aren't there? Be aware: this may convert the wedging you are experiencing into a kernel panic. >>> Is anybody experiencing the same issues with fresh RELENG_7? Unsure it is >>> my local issues though >> >> I'm not experiencing them, but these sorts of things can be quite subtle >> and workload-dependent. > > Well experiencing this issue during the system boot even.. OK. So there must be something a bit different about your setup -- perhaps there's something specific about the way things are interacting over the loopback address for the name server. Is this the stock system BIND9 or something else? Are you able to temporarily switch to an external name server and see if that changes things? Robert N M Watson Computer Laboratory University of Cambridge
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?alpine.BSF.1.10.0809191241050.3922>