From owner-freebsd-stable@FreeBSD.ORG Thu Mar 25 18:36:57 2010 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id C53CE1065673 for ; Thu, 25 Mar 2010 18:36:57 +0000 (UTC) (envelope-from pyunyh@gmail.com) Received: from mail-bw0-f216.google.com (mail-bw0-f216.google.com [209.85.218.216]) by mx1.freebsd.org (Postfix) with ESMTP id 4CEF68FC14 for ; Thu, 25 Mar 2010 18:36:56 +0000 (UTC) Received: by bwz8 with SMTP id 8so2346774bwz.3 for ; Thu, 25 Mar 2010 11:36:56 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:received:from:date:to:cc :subject:message-id:reply-to:references:mime-version:content-type :content-disposition:in-reply-to:user-agent; bh=Khg5lgnHtlPxTgxRBZ7KfuVbUyAes052bP9BHvykolI=; b=fIKi0+PWeVGG6xPEp3TWQFTk6nLB3venCFvzu7T6gbpDy1JMfoffNlhpBKSlVS5TGr Wq1Ex/vCklslwqOym2A34IvA4ge3rN2smp+lRuKUbZM1tJIwJfIHwI9cm24/gsU5qePx byO0hLlfZdtAe8hjXaT7lGo3WYQQElxzkh+pw= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=from:date:to:cc:subject:message-id:reply-to:references:mime-version :content-type:content-disposition:in-reply-to:user-agent; b=BEHcqrIdewcPixMnOPdrAW3v1fmp0UpnYxo8NxirM0iCtiJ+0KYxwCSyf3tph4RZe+ 77jGlGREsC2Qdu2AnMqIBAkiuZ0VvwNLEsAObbsS3E15dMCWiBRIR9ME1qbNAM4eIRe3 dyZ+CoInM/ZBlVHK4F63yy1CJsQqGYqnP219U= Received: by 10.204.23.6 with SMTP id p6mr56373bkb.67.1269542214900; Thu, 25 Mar 2010 11:36:54 -0700 (PDT) Received: from pyunyh@gmail.com ([174.35.1.224]) by mx.google.com with ESMTPS id 15sm40228bwz.4.2010.03.25.11.36.51 (version=TLSv1/SSLv3 cipher=RC4-MD5); Thu, 25 Mar 2010 11:36:53 -0700 (PDT) Received: by pyunyh@gmail.com (sSMTP sendmail emulation); Thu, 25 Mar 2010 11:36:28 -0700 From: Pyun YongHyeon Date: Thu, 25 Mar 2010 11:36:28 -0700 To: Attila Nagy Message-ID: <20100325183628.GD1278@michelle.cdnetworks.com> References: <4BAB718C.3090001@fsn.hu> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <4BAB718C.3090001@fsn.hu> User-Agent: Mutt/1.4.2.3i Cc: Mailing List FreeBSD Stable Subject: Re: 8-STABLE freezes on UDP traffic (DNS), 7.x doesn't X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: pyunyh@gmail.com List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 25 Mar 2010 18:36:57 -0000 On Thu, Mar 25, 2010 at 03:22:04PM +0100, Attila Nagy wrote: > Hi, > > I have some recursive nameservers, running unbound and 7.2-STABLE #0: > Wed Sep 2 13:37:17 CEST 2009 on a bunch of HP BL460c machines (bce > interfaces). > These work OK. > > During the process of migrating to 8.x, I've upgraded one of these > machines to 8.0-STABLE #25: Tue Mar 9 18:15:34 CET 2010 (the dates > indicate an approximate time, when the source was checked out from > cvsup.hu.freebsd.org, I don't know the exact revision). > > The first problem was that the machine occasionally lost network access > for some minutes. I could log in on the console, and I could see the > processes, involved in network IO in "keglim" state, but couldn't do any > network IO. This lasted for some minutes, then everything came back to > normal. > I could fix this issue by raising kern.ipc.nmbclusters to 51200 > (doubling from its default size), when I can't see these blackouts. > > But now the machine freezes. It can run for about a day, and then it > just freezes. I can't even break in to the debugger with sending NMI to it. > top says: > last pid: 92428; load averages: 0.49, 0.40, 0.38 up 0+21:13:18 > 07:41:43 > 43 processes: 2 running, 38 sleeping, 1 zombie, 2 lock > CPU: 1.3% user, 0.0% nice, 1.3% system, 26.0% interrupt, 71.3% idle > Mem: 1682M Active, 99M Inact, 227M Wired, 5444K Cache, 44M Buf, 5899M Free > Swap: > > PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAND > 45011 bind 4 49 0 1734M 1722M RUN 2 37:42 22.17% unbound > 712 bind 3 44 0 70892K 19904K uwait 0 71:07 3.86% > python2.6 > > The common in these freezes seems to be the high interrupt count. > Normally, during load the CPU times look like this: > CPU: 3.5% user, 0.0% nice, 1.8% system, 0.4% interrupt, 94.4% idle > > I could observe a "freeze", where top remained running and everything > was 0%, except interrupt, which was 25% exactly (the machine has four > cores), and another, where I could save the following console output: > CPU: 0.0% user, 0.0% nice, 0.2% system, 50.0% interrupt, 49.8% idle When you see high number of interrupts, could you check this comes from bce(4)? I guess you can use systat(1) to check how many number interrupts are generated from bce(4). > .......(partial, broken line)....32M 2423M *udp 1 50:16 10.89% unbound > 714 bind 3 44 0 70892K 26852K uwait 3 8:41 4.69% > python2.6 > 61004 root 1 62 0 37428K 10876K *udp 1 0:00 1.56% python > 706 root 1 44 0 2696K 624K piperd 1 0:07 0.00% > readproctit > > Both unbound and python accepts DNS requests, and it seems when 25% > interrupt happens, only unbound is in *udp state, where it is 50%, both > programs are in that state.