From owner-freebsd-current@FreeBSD.ORG Thu Aug 19 17:42:31 2010 Return-Path: Delivered-To: freebsd-current@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 00D4A1065679; Thu, 19 Aug 2010 17:42:31 +0000 (UTC) (envelope-from avg@icyb.net.ua) Received: from citadel.icyb.net.ua (citadel.icyb.net.ua [212.40.38.140]) by mx1.freebsd.org (Postfix) with ESMTP id 1EFCF8FC1B; Thu, 19 Aug 2010 17:42:29 +0000 (UTC) Received: from odyssey.starpoint.kiev.ua (alpha-e.starpoint.kiev.ua [212.40.38.101]) by citadel.icyb.net.ua (8.8.8p3/ICyb-2.3exp) with ESMTP id UAA13154; Thu, 19 Aug 2010 20:42:28 +0300 (EEST) (envelope-from avg@icyb.net.ua) Message-ID: <4C6D6D03.4000101@icyb.net.ua> Date: Thu, 19 Aug 2010 20:42:27 +0300 From: Andriy Gapon User-Agent: Thunderbird 2.0.0.24 (X11/20100517) MIME-Version: 1.0 To: Doug Barton References: <4C6D4CB4.20601@icyb.net.ua> <4C6D6A3C.9020507@FreeBSD.org> In-Reply-To: <4C6D6A3C.9020507@FreeBSD.org> X-Enigmail-Version: 0.95.7 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Cc: freebsd-current@FreeBSD.org Subject: Re: Runaway intr, not flash related X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 19 Aug 2010 17:42:31 -0000 on 19/08/2010 20:30 Doug Barton said the following: > On 08/19/2010 08:24, Andriy Gapon wrote: >> I am sorry, but I don't see anything dramatically wrong here. So >> "swi4: clock" uses 5.76% of WCPU, is that such a big deal to be >> called "runaway intr"? > > That's the symptom. OK, I see. Perhaps you will find this message (and its ancestor thread) interesting: http://lists.freebsd.org/pipermail/freebsd-hackers/2008-February/023447.html I believe that your issue is different, but perhaps that stuff will inspire you to use ktr(4) and schedgraph to properly debug this issue. I strongly believe that you have some sort of a scheduling issue and ktr seems to be the way to investigate it. Perhaps, you can first try the following dtrace script. It should give a better view of what statclock sees, but I am not sure if that information will be sufficient. /********************************************************/ fbt::statclock:entry /curthread->td_oncpu == 0/ { @stacks0[stack()] = count(); counts0++; } fbt::statclock:entry /curthread->td_oncpu == 1/ { @stacks1[stack()] = count(); counts1++; } fbt::statclock:entry { @stacks[pid, tid, stack()] = count(); counts++; } END { printf("\n"); printf("***** CPU 0:\n"); normalize(@stacks0, counts0 / 100); trunc(@stacks0, 5); printa("%k%@u\n\n", @stacks0); printf("\n\n"); printf("***** CPU 1:\n"); normalize(@stacks1, counts1 / 100); trunc(@stacks1, 5); printa("%k%@u\n\n", @stacks1); printf("\n\n"); printf("***** Top Processes:\n"); normalize(@stacks, counts / 200); trunc(@stacks, 20); printa(@stacks); } /********************************************************/ You would run this script when the problem hits, few seconds should be sufficient. You may want to play with values in trunc() calls, you may also want to filter gathered statistics (using conditions in /.../) by pid/tid if you spot anything interesting unusual. -- Andriy Gapon