From owner-freebsd-stable@FreeBSD.ORG Sat Sep 20 09:58:05 2008 Return-Path: Delivered-To: freebsd-stable@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id BF21B106566B; Sat, 20 Sep 2008 09:58:04 +0000 (UTC) (envelope-from oleg@opentransfer.com) Received: from smh01.opentransfer.com (smh01.opentransfer.com [71.18.216.112]) by mx1.freebsd.org (Postfix) with ESMTP id 7768B8FC12; Sat, 20 Sep 2008 09:58:04 +0000 (UTC) (envelope-from oleg@opentransfer.com) Received: by smh01.opentransfer.com (Postfix, from userid 8) id 093031020BE4; Sat, 20 Sep 2008 05:55:19 -0400 (EDT) X-Spam-Checker-Version: SpamAssassin 3.2.4 (2008-01-01) on smh01.opentransfer.com X-Spam-Level: X-Spam-Status: No, score=0.6 required=5.0 tests=MIME_QP_LONG_LINE,RDNS_NONE autolearn=disabled version=3.2.4 Received: from webmail6.opentransfer.com (unknown [69.49.230.6]) by smh01.opentransfer.com (Postfix) with ESMTP id D7679102082C; Sat, 20 Sep 2008 05:55:18 -0400 (EDT) Received: from webmail6.opentransfer.com (webmail6.opentransfer.com [127.0.0.1]) by webmail6.opentransfer.com (8.13.8/8.13.8) with ESMTP id m8K9w38I015277; Sat, 20 Sep 2008 04:58:03 -0500 Received: (from nobody@localhost) by webmail6.opentransfer.com (8.13.8/8.13.8/Submit) id m8K9w3D3015276; Sat, 20 Sep 2008 12:58:03 +0300 X-Authentication-Warning: webmail6.opentransfer.com: nobody set sender to oleg@opentransfer.com using -f Received: from cabin.theweb.org.ua (cabin.theweb.org.ua [91.195.184.50]) by webmail.opentransfer.com (Horde MIME library) with HTTP; for ; Sat, 20 Sep 2008 12:58:03 +0300 Message-ID: <20080920125803.d81jiet544cgc8g4@webmail.opentransfer.com> Date: Sat, 20 Sep 2008 12:58:03 +0300 From: "Oleg V. Nauman" To: Robert Watson References: <20080918180543.pt7s2zmaio48ww8g@webmail.opentransfer.com> <20080919143636.p661cjfopw44osco@webmail.opentransfer.com> In-Reply-To: MIME-Version: 1.0 Content-Type: text/plain; charset=KOI8-R; DelSp="Yes"; format="flowed" Content-Disposition: inline Content-Transfer-Encoding: quoted-printable User-Agent: Internet Messaging Program (IMP) H3 (4.1.4) X-Originating-IP: 91.195.184.50 Cc: freebsd-stable@FreeBSD.org Subject: Re: RELENG_7: something is very wrong with UDP? X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 20 Sep 2008 09:58:05 -0000 Quoting Robert Watson : > > On Fri, 19 Sep 2008, Oleg V. Nauman wrote: > >>> (1) Start by deleting all but one nameserver entry in /etc/resolv.conf. >>> Confirm that you can still reproduce the problem. >> >> Due to various reasons my laptop running local caching DNS server ( =20 >> named ) without any forwarders assigned. My /etc/resolv.conf =20 >> contains nameserver 127.0.0.1 > > This is simplifying in some senses, but complicating in others. In > particular, the question it raises is whether the problem is in the DNS > resolver or the nameserver. Seeing a tcpdump of lo0 for DNS traffic > would be quite interesting, since we could look at timestamps and try > to place the blame a bit more precisely. > >>> Could you >>> also use procstat -k on the dig process to generate a kernel stack tra= ce >>> for it? > > Let's add to this list: when the problem happens, could you also > procstat -k the name server process(es)? > >> And procstat -kk output for logger process waiting: >> >> PID TID COMM TDNAME KSTACK >> 1421 100095 logger - mi_switch+0x2c8 =20 >> sleepq_switch+0xd9 sleepq_catch_signals+0x239 sleepq_wait_sig+0x14 =20 >> _sleep+0x35f pipe_read+0x389 dofileread+0x96 kern_readv+0x58 =20 >> read+0x4f syscall+0x2b3 Xint0x80_syscall+0x20 > > Interesting -- logger is blocked on reading from a pipe, likely > standard input. So it sounds like something else is failing to > complete in a timely manner -- perhaps due to DNS. Nothing strange with this because it was kernel stack for logger =20 waiting on background fsck output ( bgfsck was never starting though ) > >>> This is approximately the date of my last UDP MFC. Could you try =20 >>> backing out just src/sys/netinet6/udp6_usrreq.c revision 1.81.2.7 =20 >>> and see if that helps? (specifically, restore the use of =20 >>> sosend_generic instead of sosend_dgram) > > If you can show that it's definitely a problem with the change to > sosend_dgram for UDPv6 socket send, then it might suggest it's the same > problem that it is related to the UDPv46 code there. In which case I > will propose we back out that portion of the change in the 7-stable > branch until it's known to be resolved -- I don't want other people > tripping over this. Sorry for false alarm regarding UDP issues.. Have noticed that my =20 clock is stop incrementing ( it explaining the zeroes in traceroute =20 output also ). It gave me idea what is related to this issue so =20 performed backout revision 1.243.2.4 of src/sys/dev/acpica/acpi.c and =20 it fixes my issues.. Looks like it stops incrementing the timecounters =20 on my laptop.. Ironically speaking I was this ACPI behavior change initiator ( I was =20 reporting "ACPI HPET stops working on my RELENG_7" at July 19 to =20 stable@freebsd.org) so jhb@ implemented a patch and it was working for =20 me those days. Something was changed during the next 2 months so this =20 patch causing issues instead the success on my hardware. I will play a =20 bit with kern.timecounter.choice at Monday and report it back to jhb@ =20 then. > >>> Could you try compiling your kernel with WITNESS to see if we get =20 >>> any extended debugging information? >> >> Have added WITNESS ( and STACK required by procstat ) options but =20 >> it is not producing any output ( so no LORs or something like this ) > > OK. Could you try adding INVARIANT_SUPPORT and INVARIANTS if they > aren't there? Be aware: this may convert the wedging you are > experiencing into a kernel panic. No output produced with INVARIANT_SUPPORT and INVARIANTS support =20 included in the kernel. And no kernel panic produced :) Thank you for =20 excellent work. > >>>> Is anybody experiencing the same issues with fresh RELENG_7? =20 >>>> Unsure it is my local issues though >>> >>> I'm not experiencing them, but these sorts of things can be quite =20 >>> subtle and workload-dependent. >> >> Well experiencing this issue during the system boot even.. > > OK. So there must be something a bit different about your setup -- > perhaps there's something specific about the way things are interacting > over the loopback address for the name server. Is this the stock > system BIND9 or something else? Are you able to temporarily switch to I have stock system BIND running > an external name server and see if that changes things? > > Robert N M Watson > Computer Laboratory > University of Cambridge