Date: Thu, 19 Aug 2010 20:42:27 +0300 From: Andriy Gapon <avg@icyb.net.ua> To: Doug Barton <dougb@FreeBSD.org> Cc: freebsd-current@FreeBSD.org Subject: Re: Runaway intr, not flash related Message-ID: <4C6D6D03.4000101@icyb.net.ua> In-Reply-To: <4C6D6A3C.9020507@FreeBSD.org> References: <alpine.BSF.2.00.1008121349230.1721@qbhto.arg> <4C6D4CB4.20601@icyb.net.ua> <4C6D6A3C.9020507@FreeBSD.org>
next in thread | previous in thread | raw e-mail | index | archive | help
on 19/08/2010 20:30 Doug Barton said the following: > On 08/19/2010 08:24, Andriy Gapon wrote: >> I am sorry, but I don't see anything dramatically wrong here. So >> "swi4: clock" uses 5.76% of WCPU, is that such a big deal to be >> called "runaway intr"? > > That's the symptom. OK, I see. Perhaps you will find this message (and its ancestor thread) interesting: http://lists.freebsd.org/pipermail/freebsd-hackers/2008-February/023447.html I believe that your issue is different, but perhaps that stuff will inspire you to use ktr(4) and schedgraph to properly debug this issue. I strongly believe that you have some sort of a scheduling issue and ktr seems to be the way to investigate it. Perhaps, you can first try the following dtrace script. It should give a better view of what statclock sees, but I am not sure if that information will be sufficient. /********************************************************/ fbt::statclock:entry /curthread->td_oncpu == 0/ { @stacks0[stack()] = count(); counts0++; } fbt::statclock:entry /curthread->td_oncpu == 1/ { @stacks1[stack()] = count(); counts1++; } fbt::statclock:entry { @stacks[pid, tid, stack()] = count(); counts++; } END { printf("\n"); printf("***** CPU 0:\n"); normalize(@stacks0, counts0 / 100); trunc(@stacks0, 5); printa("%k%@u\n\n", @stacks0); printf("\n\n"); printf("***** CPU 1:\n"); normalize(@stacks1, counts1 / 100); trunc(@stacks1, 5); printa("%k%@u\n\n", @stacks1); printf("\n\n"); printf("***** Top Processes:\n"); normalize(@stacks, counts / 200); trunc(@stacks, 20); printa(@stacks); } /********************************************************/ You would run this script when the problem hits, few seconds should be sufficient. You may want to play with values in trunc() calls, you may also want to filter gathered statistics (using conditions in /.../) by pid/tid if you spot anything interesting unusual. -- Andriy Gapon
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4C6D6D03.4000101>