From owner-freebsd-stable@FreeBSD.ORG  Sat Sep 20 09:58:05 2008
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: freebsd-stable@FreeBSD.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id BF21B106566B;
	Sat, 20 Sep 2008 09:58:04 +0000 (UTC)
	(envelope-from oleg@opentransfer.com)
Received: from smh01.opentransfer.com (smh01.opentransfer.com [71.18.216.112])
	by mx1.freebsd.org (Postfix) with ESMTP id 7768B8FC12;
	Sat, 20 Sep 2008 09:58:04 +0000 (UTC)
	(envelope-from oleg@opentransfer.com)
Received: by smh01.opentransfer.com (Postfix, from userid 8)
	id 093031020BE4; Sat, 20 Sep 2008 05:55:19 -0400 (EDT)
X-Spam-Checker-Version: SpamAssassin 3.2.4 (2008-01-01) on
	smh01.opentransfer.com
X-Spam-Level: 
X-Spam-Status: No, score=0.6 required=5.0 tests=MIME_QP_LONG_LINE,RDNS_NONE
	autolearn=disabled version=3.2.4
Received: from webmail6.opentransfer.com (unknown [69.49.230.6])
	by smh01.opentransfer.com (Postfix) with ESMTP id D7679102082C;
	Sat, 20 Sep 2008 05:55:18 -0400 (EDT)
Received: from webmail6.opentransfer.com (webmail6.opentransfer.com
	[127.0.0.1])
	by webmail6.opentransfer.com (8.13.8/8.13.8) with ESMTP id
	m8K9w38I015277; Sat, 20 Sep 2008 04:58:03 -0500
Received: (from nobody@localhost)
	by webmail6.opentransfer.com (8.13.8/8.13.8/Submit) id m8K9w3D3015276; 
	Sat, 20 Sep 2008 12:58:03 +0300
X-Authentication-Warning: webmail6.opentransfer.com: nobody set sender to
	oleg@opentransfer.com using -f
Received: from cabin.theweb.org.ua (cabin.theweb.org.ua [91.195.184.50]) by
	webmail.opentransfer.com (Horde MIME library) with HTTP; for
	<oleg@opentransfer.com>; Sat, 20 Sep 2008 12:58:03 +0300
Message-ID: <20080920125803.d81jiet544cgc8g4@webmail.opentransfer.com>
Date: Sat, 20 Sep 2008 12:58:03 +0300
From: "Oleg V. Nauman" <oleg@opentransfer.com>
To: Robert Watson <rwatson@FreeBSD.org>
References: <20080918180543.pt7s2zmaio48ww8g@webmail.opentransfer.com>
	<alpine.BSF.1.10.0809182005570.16464@fledge.watson.org>
	<20080919143636.p661cjfopw44osco@webmail.opentransfer.com>
	<alpine.BSF.1.10.0809191241050.3922@fledge.watson.org>
In-Reply-To: <alpine.BSF.1.10.0809191241050.3922@fledge.watson.org>
MIME-Version: 1.0
Content-Type: text/plain;
	charset=KOI8-R;
	DelSp="Yes";
	format="flowed"
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable
User-Agent: Internet Messaging Program (IMP) H3 (4.1.4)
X-Originating-IP: 91.195.184.50
Cc: freebsd-stable@FreeBSD.org
Subject: Re: RELENG_7: something is very wrong with UDP?
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>, 
	<mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 20 Sep 2008 09:58:05 -0000

Quoting Robert Watson <rwatson@FreeBSD.org>:

>
> On Fri, 19 Sep 2008, Oleg V. Nauman wrote:
>
>>> (1) Start by deleting all but one nameserver entry in /etc/resolv.conf.
>>>   Confirm that you can still reproduce the problem.
>>
>> Due to various reasons my laptop running local caching DNS server ( =20
>>  named ) without any forwarders assigned. My /etc/resolv.conf  =20
>> contains nameserver 127.0.0.1
>
> This is simplifying in some senses, but complicating in others.  In
> particular, the question it raises is whether the problem is in the DNS
> resolver or the nameserver.  Seeing a tcpdump of lo0 for DNS traffic
> would be quite interesting, since we could look at timestamps and try
> to place the blame a bit more precisely.
>
>>> Could you
>>>   also use procstat -k on the dig process to generate a kernel stack tra=
ce
>>>   for it?
>
> Let's add to this list: when the problem happens, could you also
> procstat -k the name server process(es)?
>
>> And procstat -kk output for logger process waiting:
>>
>> PID    TID COMM             TDNAME           KSTACK
>> 1421 100095 logger           -                mi_switch+0x2c8  =20
>> sleepq_switch+0xd9 sleepq_catch_signals+0x239 sleepq_wait_sig+0x14  =20
>> _sleep+0x35f pipe_read+0x389 dofileread+0x96 kern_readv+0x58  =20
>> read+0x4f syscall+0x2b3 Xint0x80_syscall+0x20
>
> Interesting -- logger is blocked on reading from a pipe, likely
> standard input.  So it sounds like something else is failing to
> complete in a timely manner -- perhaps due to DNS.

  Nothing strange with this because it was kernel stack for logger =20
waiting on background fsck output ( bgfsck was never starting though )

>
>>> This is approximately the date of my last UDP MFC.  Could you try  =20
>>> backing out just src/sys/netinet6/udp6_usrreq.c revision 1.81.2.7  =20
>>> and see if that helps? (specifically, restore the use of  =20
>>> sosend_generic instead of sosend_dgram)
>
> If you can show that it's definitely a problem with the change to
> sosend_dgram for UDPv6 socket send, then it might suggest it's the same
> problem that it is related to the UDPv46 code there.  In which case I
> will propose we back out that portion of the change in the 7-stable
> branch until it's known to be resolved -- I don't want other people
> tripping over this.

  Sorry for false alarm regarding UDP issues.. Have noticed that my =20
clock is stop incrementing ( it explaining the zeroes in traceroute =20
output also ). It gave me idea what is related to this issue so =20
performed backout revision 1.243.2.4 of src/sys/dev/acpica/acpi.c and =20
it fixes my issues.. Looks like it stops incrementing the timecounters =20
on my laptop..
Ironically speaking I was this ACPI behavior change initiator ( I was =20
reporting "ACPI HPET stops working on my RELENG_7" at July 19 to =20
stable@freebsd.org) so jhb@ implemented a patch and it was working for =20
me those days. Something was changed during the next 2 months so this =20
patch causing issues instead the success on my hardware. I will play a =20
bit with kern.timecounter.choice at Monday and report it back to jhb@ =20
then.

>
>>> Could you try compiling your kernel with WITNESS to see if we get  =20
>>> any extended debugging information?
>>
>> Have added WITNESS ( and STACK required by procstat ) options but  =20
>> it is not producing any output ( so no LORs or something like this )
>
> OK.  Could you try adding INVARIANT_SUPPORT and INVARIANTS if they
> aren't there?  Be aware: this may convert the wedging you are
> experiencing into a kernel panic.

  No output produced with INVARIANT_SUPPORT and INVARIANTS support =20
included in the kernel. And no kernel panic produced :) Thank you for =20
excellent work.

>
>>>> Is anybody experiencing the same issues with fresh RELENG_7?  =20
>>>> Unsure it is my local issues though
>>>
>>> I'm not experiencing them, but these sorts of things can be quite  =20
>>> subtle and workload-dependent.
>>
>> Well experiencing this issue during the system boot even..
>
> OK.  So there must be something a bit different about your setup --
> perhaps there's something specific about the way things are interacting
> over the loopback address for the name server.  Is this the stock
> system BIND9 or something else?  Are you able to temporarily switch to

  I have stock system BIND running

> an external name server and see if that changes things?
>
> Robert N M Watson
> Computer Laboratory
> University of Cambridge