From owner-freebsd-arch@FreeBSD.ORG Wed Apr 8 16:12:19 2015 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 1D58EA7C; Wed, 8 Apr 2015 16:12:19 +0000 (UTC) Received: from bigwig.baldwin.cx (bigwig.baldwin.cx [IPv6:2001:470:1f11:75::1]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id E625DF65; Wed, 8 Apr 2015 16:12:18 +0000 (UTC) Received: from ralph.baldwin.cx (pool-173-54-116-245.nwrknj.fios.verizon.net [173.54.116.245]) by bigwig.baldwin.cx (Postfix) with ESMTPSA id E0621B953; Wed, 8 Apr 2015 12:12:17 -0400 (EDT) From: John Baldwin To: Adrian Chadd Subject: Re: x86: finding interrupts that aren't being accounted for? Date: Mon, 06 Apr 2015 17:28:02 -0400 Message-ID: <71486315.GsjOnd645i@ralph.baldwin.cx> User-Agent: KMail/4.14.2 (FreeBSD/10.1-STABLE; KDE/4.14.2; amd64; ; ) In-Reply-To: References: MIME-Version: 1.0 Content-Transfer-Encoding: 7Bit Content-Type: text/plain; charset="us-ascii" X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.2.7 (bigwig.baldwin.cx); Wed, 08 Apr 2015 12:12:18 -0400 (EDT) Cc: "freebsd-arch@freebsd.org" , Rui Paulo X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 08 Apr 2015 16:12:19 -0000 On Monday, April 06, 2015 02:16:23 PM Adrian Chadd wrote: > On 6 April 2015 at 14:15, Rui Paulo wrote: > > > >> On Apr 6, 2015, at 13:38, Adrian Chadd wrote: > >> > >> On 6 April 2015 at 12:18, John Baldwin wrote: > >>> On Monday, April 06, 2015 12:21:29 AM Adrian Chadd wrote: > >>>> Hi, > >>>> > >>>> I have an .. odd problem on a Lenovo X230. > >>>> > >>>> I just threw in a very old wifi card (Intel 3945) into the expresscard > >>>> (pcie) slot. Now, we don't have any pcie-hp support in -HEAD just yet, > >>>> but i wasn't expecting the system to crawl to a halt. > >>>> > >>>> When I unplug it, everything returns to normal. > >>>> > >>>> Other cards don't do this. > >>>> > >>>> So, I figured it may be interrupt spam - but vmstat -ia shows no > >>>> interrupts going crazy. > >>>> > >>>> pmcstat -S CPU_CLK_UNHALTED_CORE -T -w 5 doesn't register anything > >>>> either - only a handful of background samples. > >>>> > >>>> However, /counter/ mode pmc tells a different story - pmcstat -s > >>>> CPU_CLK_UNHALTED_CORE -w 1 shows all four cores going at 110% when the > >>>> card is inserted, with brief periods of idle. Once I remove the card, > >>>> the counters go back down to zero. > >>>> > >>>> My working theory is: something is chewing CPU and it's likely > >>>> interrupts, but if it is, it's something far, far earlier than the x86 > >>>> interrupt C code, which counts interrupts and spurious events. > >>>> > >>>> So - has anyone diagnosed this stuff on FreeBSD/x86 before? I was kind > >>>> of hoping we'd at least get accurate statistics about spurious > >>>> interrupts, and if we don't, I'd like to understand why. > >>>> > >>>> Thanks! > >>> > >>> SMM? Perhaps SMM doesn't hide itself from PMC counters (but it can hide itself > >>> from samples). > >>> > >>> If it is SMM there's not really anything you can do about it. Try getting a > >>> KTR_SCHED trace and looking at it in schedgraph. When I've seen SMM isuses in > >>> the past it shows up as hole in the graph where nothing happens in the system. > >>> > >>> In your case you could perhaps be getting PCI errors that are triggering the > >>> SMM handler. Perhaps compare pciconf -le before and after to see if there are > >>> any changes. > >> > >> Hm, ok. Can we extract PCIe errors yet? > > > > Yes, check pciconf. > > I'll try, but the system is pretty unusable whilst the card is plugged in... PCI errors latch. You can run 'pciconf -le' after you yank the card back out. I would just do this: 'pciconf -le > before' 'pciconf -le > after' Compare before and after using something like 'kompare'. -- John Baldwin