From owner-freebsd-stable@FreeBSD.ORG  Thu Mar 25 19:31:00 2010
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: freebsd-stable@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 8DC281065670
	for <freebsd-stable@freebsd.org>; Thu, 25 Mar 2010 19:31:00 +0000 (UTC)
	(envelope-from bra@fsn.hu)
Received: from people.fsn.hu (people.fsn.hu [195.228.252.137])
	by mx1.freebsd.org (Postfix) with ESMTP id 3DEF88FC1D
	for <freebsd-stable@freebsd.org>; Thu, 25 Mar 2010 19:30:59 +0000 (UTC)
Received: by people.fsn.hu (Postfix, from userid 1001)
	id E59DB23B1E5; Thu, 25 Mar 2010 20:30:57 +0100 (CET)
X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MF-ACE0E1EA [pR:
	23.8687]
X-CRM114-CacheID: sfid-20100325_20305_63671EE7 
X-CRM114-Status: Good  ( pR: 23.8687 )
Message-ID: <4BABB9F0.6010506@fsn.hu>
Date: Thu, 25 Mar 2010 20:30:56 +0100
From: Attila Nagy <bra@fsn.hu>
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US;
	rv:1.8.1.23) Gecko/20090817 Thunderbird/2.0.0.23 Mnenhy/0.7.6.0
MIME-Version: 1.0
To: pyunyh@gmail.com
References: <4BAB718C.3090001@fsn.hu>
	<20100325183628.GD1278@michelle.cdnetworks.com>
In-Reply-To: <20100325183628.GD1278@michelle.cdnetworks.com>
X-Stationery: 0.4.10
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.3
	(people.fsn.hu); Thu, 25 Mar 2010 20:30:56 +0100 (CET)
Cc: Mailing List FreeBSD Stable <freebsd-stable@freebsd.org>
Subject: Re: 8-STABLE freezes on UDP traffic (DNS), 7.x doesn't
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>, 
	<mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 25 Mar 2010 19:31:00 -0000

Pyun YongHyeon wrote:
> On Thu, Mar 25, 2010 at 03:22:04PM +0100, Attila Nagy wrote:
>   
>> Hi,
>>
>> I have some recursive nameservers, running unbound and 7.2-STABLE #0: 
>> Wed Sep  2 13:37:17 CEST 2009 on a bunch of HP BL460c machines (bce 
>> interfaces).
>> These work OK.
>>
>> During the process of migrating to 8.x, I've upgraded one of these 
>> machines to 8.0-STABLE #25: Tue Mar  9 18:15:34 CET 2010 (the dates 
>> indicate an approximate time, when the source was checked out from 
>> cvsup.hu.freebsd.org, I don't know the exact revision).
>>
>> The first problem was that the machine occasionally lost network access 
>> for some minutes. I could log in on the console, and I could see the 
>> processes, involved in network IO in "keglim" state, but couldn't do any 
>> network IO. This lasted for some minutes, then everything came back to 
>> normal.
>> I could fix this issue by raising kern.ipc.nmbclusters to 51200 
>> (doubling from its default size), when I can't see these blackouts.
>>
>> But now the machine freezes. It can run for about a day, and then it 
>> just freezes. I can't even break in to the debugger with sending NMI to it.
>> top says:
>> last pid: 92428;  load averages:  0.49,  0.40,  0.38    up 0+21:13:18  
>> 07:41:43
>> 43 processes:  2 running, 38 sleeping, 1 zombie, 2 lock
>> CPU:  1.3% user,  0.0% nice,  1.3% system, 26.0% interrupt, 71.3% idle
>> Mem: 1682M Active, 99M Inact, 227M Wired, 5444K Cache, 44M Buf, 5899M Free
>> Swap:
>>
>>   PID USERNAME   THR PRI NICE   SIZE    RES STATE   C   TIME   WCPU COMMAND
>> 45011 bind         4  49    0  1734M  1722M RUN     2  37:42 22.17% unbound
>>   712 bind         3  44    0 70892K 19904K uwait   0  71:07  3.86% 
>> python2.6
>>
>> The common in these freezes seems to be the high interrupt count. 
>> Normally, during load the CPU times look like this:
>> CPU:  3.5% user,  0.0% nice,  1.8% system,  0.4% interrupt, 94.4% idle
>>
>> I could observe a "freeze", where top remained running and everything 
>> was 0%, except interrupt, which was 25% exactly (the machine has four 
>> cores), and another, where I could save the following console output:
>> CPU:  0.0% user,  0.0% nice,  0.2% system, 50.0% interrupt, 49.8% idle
>>     
>
> When you see high number of interrupts, could you check this comes
> from bce(4)? I guess you can use systat(1) to check how many number
> interrupts are generated from bce(4).
>   
I've tried it multiple times, but couldn't yet catch the moment when the
machine was still alive (so the script could run) and there were
increased amount of interrupts.