From owner-freebsd-acpi@FreeBSD.ORG Mon Feb 11 20:20:34 2008 Return-Path: Delivered-To: freebsd-acpi@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id E853216A417 for ; Mon, 11 Feb 2008 20:20:34 +0000 (UTC) (envelope-from tech@liveoaksf.org) Received: from assassin.liveoaksf.org (mail.liveoaksf.org [216.31.235.91]) by mx1.freebsd.org (Postfix) with SMTP id C825013C4E8 for ; Mon, 11 Feb 2008 20:20:34 +0000 (UTC) (envelope-from tech@liveoaksf.org) Received: (qmail 91313 invoked by uid 1004); 11 Feb 2008 20:20:48 -0000 Received: from 192.168.1.45 by assassin.liveoaksf.org (envelope-from , uid 1002) with qmail-scanner-1.25-st-qms (clamdscan: 0.92/5736. spamassassin: 3.2.4. perlscan: 1.25-st-qms. Clear:RC:0(192.168.1.45):SA:0(-4.3/4.5):. Processed in 2.837658 secs); 11 Feb 2008 20:20:48 -0000 X-Spam-Status: No, hits=-4.3 required=4.5 X-Antivirus-LIVEOAKSF-Mail-From: tech@liveoaksf.org via assassin.liveoaksf.org X-Antivirus-LIVEOAKSF: 1.25-st-qms (Clear:RC:0(192.168.1.45):SA:0(-4.3/4.5):. Processed in 2.837658 secs Process 91294) Received: from unknown (HELO ?192.168.1.45?) (tech@liveoaksf.org@192.168.1.45) by assassin.liveoaksf.org with SMTP; 11 Feb 2008 20:20:44 -0000 In-Reply-To: <200802040900.54630.jhb@freebsd.org> References: <429F40B0-20EE-4F47-847A-A6B1E91BA79F@liveoaksf.org> <47A217FC.1080606@root.org> <8EE3D963-E390-4F45-A1D1-2295C1767B80@liveoaksf.org> <200802040900.54630.jhb@freebsd.org> Mime-Version: 1.0 (Apple Message framework v753) Content-Type: text/plain; charset=US-ASCII; delsp=yes; format=flowed Message-Id: <7D7C052A-86E1-489F-B2F9-541C8522EBF4@liveoaksf.org> Content-Transfer-Encoding: 7bit From: Tech Lab Manager Date: Mon, 11 Feb 2008 12:20:28 -0800 To: freebsd-acpi@freebsd.org X-Mailer: Apple Mail (2.753) Cc: Subject: Re: SMP, ACPI and interrupt storm X-BeenThere: freebsd-acpi@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: ACPI and power management development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 11 Feb 2008 20:20:35 -0000 On Feb 4, 2008, at 6:00 AM, John Baldwin wrote: > On Thursday 31 January 2008 02:35:52 pm Tech Lab Manager wrote: >> On Jan 31, 2008, at 10:48 AM, Nate Lawson wrote: >> >>> Tech Lab Manager wrote: >>>> Sorry for the cross-post from freebsd-smb. >>>> Building 6.3-RELEASE and 7.0-RC1 on dual Xeon (4 CPU) boxes: >>>> options SMP >>>> device apic >>>> SMP kernel builds fine, all 4 CPUs launch on reboot. >>>> But I get a TON of interrupts from acpi0 -- about 67,000 per second >>>> according to vmstat -i. With system at idle and almost no services >>>> running, here is output of top -S: >>>> last pid: 877; load averages: 1.18, 0.48, 0.19 >>>> 75 processes: 6 running, 54 sleeping, 15 waiting >>>> CPU states: 0.0% user, 0.0% nice, 0.2% system, 22.4% >>>> interrupt, 77.4% idle >>>> Mem: 31M Active, 12M Inact, 28M Wired, 16K Cache, 15M Buf, 3822M >>>> Free >>>> Swap: 4096M Total, 4096M Free >>>> PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU >>>> COMMAND >>>> 10 root 1 171 52 0K 8K RUN 3 1:11 99.18% >>>> idle: cpu3 >>>> 13 root 1 171 52 0K 8K CPU0 0 1:10 98.88% >>>> idle: cpu0 >>>> 12 root 1 171 52 0K 8K CPU1 1 1:09 98.78% >>>> idle: cpu1 >>>> 21 root 1 -52 -171 0K 8K CPU2 2 0:54 87.24% >>>> irq9: acpi0 >>>> 11 root 1 171 52 0K 8K RUN 2 0:17 11.19% >>>> idle: cpu2 >>>> Notice high load and interrupt % of CPU. >>>> If turn off ACPI (e.g. set hint.apic.0.disabled=1 in /boot/ >>>> loader.conf), >>>> the interrupt storm ceases, but then I'm only running on one CPU. >>> >>> That doesn't turn off acpi, that turns of the APIC (interrupt >>> controller). Try: >>> hint.acpi.0.disabled=1 >> >> Sorry, my mistake in writing ACPI above -- I *was* trying to turn off >> apic, based on a note in the FreeBSD handbook. >> >> Disabling ACPI as you suggest above has the same effect as turning >> off APIC: the interrupt storm is disabled but only one CPU is >> launched. >> >>> >>>> The BIOS ACPI settings are all Enabled. Hyperthreading is Enabled. >>>> These machines have been running RedHat Enterprise 5.0 with full >>>> multiprocessor support. >>> >>> This looks like a failure to sleep in C1 (hlt). Someone else >>> reported this probably earlier, but all debugging showed the >>> inexplicable -- the HLT instruction was being executed but just did >>> not work (returned immediately). >>> >>> There will be a new 7.0 build that fixes one interrupt storm >>> related to level-triggered GPEs. If you can cvsup your 7.0 branch >>> (RELENG_7_0) and retry, that might be helpful to see if it also >>> fixes your problem. >> >> okay, I'm on RC1, will switch to RELENG and report back. >> >> I'm not sure if this is a red herring, but acpidump -t reports: >> >> Type=INT Override >> BUS=0 >> IRQ=0 >> INTR=2 >> Flags={Polarity=conforming, Trigger=conforming} >> >> which looks wrong on several counts (IRQ, INTR should be 9, >> Trigger=level). dmesg even says: >> "MADT: Forcing active-low polarity and level trigger for SCI" > > No, this is an entry for something other than the SCI. You can > have multiple > interrupt override entries and this entry is typical on all x86 > systems with > APICs (the 8259As are tied into pin 0 as a daisy chain and IRQ0 is > tied into > intpin 2 since IRQ2 isn't usable with 8259As. Do you have an entry > at all > for IRQ 9? If not, then the hw.acpi.sci tunables currently won't > do anything > (I can fix it so that they do, however). Here's an update on this issue. I csup'ed my source tree (RELENG_7_0 now at RC2) last Friday and rebuilt world. Two things look slightly different now: 1) On reboot, I still see an interrupt storm at acpi0 (irq9) at around 75k/sec; however over time the interrupt rate actually drops, to around 15k/sec after a few days (perhaps it settles further, time will tell). 2) load average [at idle] is down quite a bit, from a previous average of ~1.0 to an average that seems to vacillate between a low of 0.10 to a high of 0.35. $ top -S last pid: 1038; load averages: 0.22, 0.18, 0.15 67 processes: 5 running, 46 sleeping, 16 waiting CPU states: 0.0% user, 0.0% nice, 0.1% system, 21.0% interrupt, 78.9% idle Mem: 6468K Active, 5232K Inact, 23M Wired, 1540K Cache, 8688K Buf, 3849M Free Swap: 4096M Total, 4096M Free PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAND 11 root 1 171 ki31 0K 8K CPU3 3 74:15 99.02% idle: cpu3 12 root 1 171 ki31 0K 8K CPU2 2 74:14 99.02% idle: cpu2 13 root 1 171 ki31 0K 8K RUN 1 74:10 99.02% idle: cpu1 24 root 1 -52 - 0K 8K WAIT 0 58:08 83.15% irq9: acpi0 14 root 1 171 ki31 0K 8K RUN 0 16:05 14.84% idle: cpu0 Note: for kicks I tried rebuilding the kernel with options MPTABLE_FORCE_HTT and IPI_PREEMPTION, though without any apparent effect. No device polling, and using SCHED_4BSD for what it's worth. I don't know what a typical load for a multi-cpu box looks like; we've only run single-cpu systems here, and even when working our server loads are typically pretty close to 0.0. Basically we inherited a bunch of dual Xeon machines and I'd like to make them work-- of course I can just run them on one cpu but that seems kind of silly. (Unfortunately I'm just a school administrator and not much of a hardware guy, so I'm a little out of my depth here...;| ) Thanks for any further assistance anyone can provide. -- John Berliner Live Oak School