From owner-freebsd-current@FreeBSD.ORG Thu Jun 24 18:37:51 2004 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id C29B616A4CE for ; Thu, 24 Jun 2004 18:37:51 +0000 (GMT) Received: from mail1.speakeasy.net (mail1.speakeasy.net [216.254.0.201]) by mx1.FreeBSD.org (Postfix) with ESMTP id 8B15943D5E for ; Thu, 24 Jun 2004 18:37:51 +0000 (GMT) (envelope-from jhb@FreeBSD.org) Received: (qmail 32446 invoked from network); 24 Jun 2004 18:37:39 -0000 Received: from dsl027-160-063.atl1.dsl.speakeasy.net (HELO server.baldwin.cx) ([216.27.160.63]) (envelope-sender ) encrypted SMTP for ; 24 Jun 2004 18:37:39 -0000 Received: from 10.50.41.233 (gw1.twc.weather.com [216.133.140.1]) by server.baldwin.cx (8.12.11/8.12.11) with ESMTP id i5OIbQv9073655; Thu, 24 Jun 2004 14:37:27 -0400 (EDT) (envelope-from jhb@FreeBSD.org) From: John Baldwin To: Gerrit Nagelhout Date: Thu, 24 Jun 2004 14:38:29 -0400 User-Agent: KMail/1.6 References: In-Reply-To: MIME-Version: 1.0 Content-Disposition: inline Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Message-Id: <200406241438.29489.jhb@FreeBSD.org> X-Spam-Checker-Version: SpamAssassin 2.63 (2004-01-11) on server.baldwin.cx cc: kris@FreeBSD.org cc: freebsd-current@FreeBSD.org cc: Julian Elischer Subject: Re: STI, HLT in acpi_cpu_idle_c1 X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 24 Jun 2004 18:37:52 -0000 On Thursday 24 June 2004 10:36 am, Gerrit Nagelhout wrote: > Here's some information about another slightly different > lockup. CPU0 is blocked in smp_targeted_tlb_shootdown (vector 0xf5). > CPU2 & 3 are in acpi_cpu_c1. CPU1 (again) is in acpi_cpu_c1, > but it has an interrupt pending. In this case, the pending > interrupt is bit 27. 224 + 27 = 251 = IPI_HARDCLOCK. > How can I figure out how CPU1 got stuck in this state? As > far as I can tell, there is either a h/w problem, or CPU1 > has gone to sleep after starting to handle an interrupt. > Thanks, Does all of the deadlocks stop if you turn off halting when idle by doing 'sysctl machdep.cpu_idle_hlt=0'? > Gerrit > > P0>dumpAllLocalApic > CPU 0 > ID: 0x6000000 > TPR: 0x0 > PPR: 0x0 > icr_lo:0xf5 last sent INVLPG > APR: 0x0 > ISR0: 0x0 > ISR1: 0x0 > ISR2: 0x0 > ISR3: 0x0 > ISR4: 0x0 > ISR5: 0x0 > ISR6: 0x0 > ISR7: 0x0 > IRR0: 0x0 > IRR1: 0x0 > IRR2: 0x0 > IRR3: 0x0 > IRR4: 0x0 > IRR5: 0x0 > IRR6: 0x0 > IRR7: 0x18000000 This actually has 2 pending interrupts that it needs to service, both 252 (statclock) and 251 (hardclock). > TMR0: 0x0 > TMR1: 0x0 > TMR2: 0x0 > TMR3: 0x0 > TMR4: 0x0 > TMR5: 0x0 > TMR6: 0x0 > TMR7: 0x0 > CPU 1 > ID: 0x7000000 > TPR: 0x0 > PPR: 0xf0 > icr_lo:0xf3 last sent AST > APR: 0x0 > ISR0: 0x0 > ISR1: 0x0 > ISR2: 0x0 > ISR3: 0x0 > ISR4: 0x0 > ISR5: 0x0 > ISR6: 0x0 > ISR7: 0x8000000 Currently handling hardclock > IRR0: 0x0 > IRR1: 0x0 > IRR2: 0x0 > IRR3: 0x0 > IRR4: 0x0 > IRR5: 0x0 > IRR6: 0x0 > IRR7: 0x18200000 This has 3 pending (INVLPG, hardclock, statclock) and is currently servicing statclock. This means some CPU has sent INVLPG (f5) and is spinning with interrupts disabled waiting for CPU 1 to ack. This could be CPU 0. > TMR0: 0x0 > TMR1: 0x0 > TMR2: 0x0 > TMR3: 0x0 > TMR4: 0x0 > TMR5: 0x0 > TMR6: 0x0 > TMR7: 0x0 > CPU 2 > ID: 0x0 > TPR: 0x0 > PPR: 0x0 > icr_lo:0xfb last sent hardclock > APR: 0x0 > ISR0: 0x0 > ISR1: 0x0 > ISR2: 0x0 > ISR3: 0x0 > ISR4: 0x0 > ISR5: 0x0 > ISR6: 0x0 > ISR7: 0x0 > IRR0: 0x0 > IRR1: 0x1000000 > IRR2: 0x0 > IRR3: 0x0 > IRR4: 0x20000 > IRR5: 0x0 > IRR6: 0x0 > IRR7: 0x0 > TMR0: 0x0 > TMR1: 0x0 > TMR2: 0x1000 > TMR3: 0x0 > TMR4: 0x20000 > TMR5: 0x0 > TMR6: 0x0 > TMR7: 0x0 CPU 2 must have interrupts disabled as it has 2 PCI interrupts (IRQs 56 and 145, must have a lot of I/O APICs in this box!) both which are level triggered (hence bits set in TMR). > CPU 3 > ID: 0x1000000 > TPR: 0x0 > PPR: 0x0 > icr_lo:0xf3 last sent an AST > APR: 0x0 > ISR0: 0x0 > ISR1: 0x0 > ISR2: 0x0 > ISR3: 0x0 > ISR4: 0x0 > ISR5: 0x0 > ISR6: 0x0 > ISR7: 0x0 > IRR0: 0x0 > IRR1: 0x0 > IRR2: 0x0 > IRR3: 0x0 > IRR4: 0x0 > IRR5: 0x0 > IRR6: 0x0 > IRR7: 0x0 > TMR0: 0x0 > TMR1: 0x0 > TMR2: 0x0 > TMR3: 0x0 > TMR4: 0x0 > TMR5: 0x0 > TMR6: 0x0 > TMR7: 0x0 Nothing pending or currently executing. Its ok for this one to be halted (CPU3), but neither CPU2 nor CPU1 should be halted. CPU2 claims to be executing Xhardclock which does an EOI in < 20 instructions after it starts. Does the ISR for CPU 2 clear if you let it continue for a bit? -- John Baldwin <>< http://www.FreeBSD.org/~jhb/ "Power Users Use the Power to Serve" = http://www.FreeBSD.org