From owner-freebsd-current@FreeBSD.ORG  Thu Jun 24 18:37:51 2004
Return-Path: <owner-freebsd-current@FreeBSD.ORG>
Delivered-To: freebsd-current@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id C29B616A4CE
	for <freebsd-current@FreeBSD.org>;
	Thu, 24 Jun 2004 18:37:51 +0000 (GMT)
Received: from mail1.speakeasy.net (mail1.speakeasy.net [216.254.0.201])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 8B15943D5E
	for <freebsd-current@FreeBSD.org>;
	Thu, 24 Jun 2004 18:37:51 +0000 (GMT)	(envelope-from jhb@FreeBSD.org)
Received: (qmail 32446 invoked from network); 24 Jun 2004 18:37:39 -0000
Received: from dsl027-160-063.atl1.dsl.speakeasy.net (HELO server.baldwin.cx)
	([216.27.160.63])          (envelope-sender <jhb@FreeBSD.org>)
	encrypted SMTP
	for <kris@FreeBSD.org>; 24 Jun 2004 18:37:39 -0000
Received: from 10.50.41.233 (gw1.twc.weather.com [216.133.140.1])
	by server.baldwin.cx (8.12.11/8.12.11) with ESMTP id i5OIbQv9073655;
	Thu, 24 Jun 2004 14:37:27 -0400 (EDT)
	(envelope-from jhb@FreeBSD.org)
From: John Baldwin <jhb@FreeBSD.org>
To: Gerrit Nagelhout <gnagelhout@sandvine.com>
Date: Thu, 24 Jun 2004 14:38:29 -0400
User-Agent: KMail/1.6
References: <FE045D4D9F7AED4CBFF1B3B813C85337054EC4BB@mail.sandvine.com>
In-Reply-To: <FE045D4D9F7AED4CBFF1B3B813C85337054EC4BB@mail.sandvine.com>
MIME-Version: 1.0
Content-Disposition: inline
Content-Type: text/plain;
  charset="iso-8859-1"
Content-Transfer-Encoding: 7bit
Message-Id: <200406241438.29489.jhb@FreeBSD.org>
X-Spam-Checker-Version: SpamAssassin 2.63 (2004-01-11) on server.baldwin.cx
cc: kris@FreeBSD.org
cc: freebsd-current@FreeBSD.org
cc: Julian Elischer <julian@elischer.org>
Subject: Re: STI, HLT in acpi_cpu_idle_c1
X-BeenThere: freebsd-current@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Discussions about the use of FreeBSD-current
	<freebsd-current.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>,
	<mailto:freebsd-current-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-current>
List-Post: <mailto:freebsd-current@freebsd.org>
List-Help: <mailto:freebsd-current-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>,
	<mailto:freebsd-current-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 24 Jun 2004 18:37:52 -0000

On Thursday 24 June 2004 10:36 am, Gerrit Nagelhout wrote:
> Here's some information about another slightly different
> lockup.  CPU0 is blocked in smp_targeted_tlb_shootdown (vector 0xf5).
> CPU2 & 3 are in acpi_cpu_c1.  CPU1 (again) is in acpi_cpu_c1,
> but it has an interrupt pending.  In this case, the pending
> interrupt is bit 27.  224 + 27 = 251 = IPI_HARDCLOCK.
> How can I figure out how CPU1 got stuck in this state?  As
> far as I can tell, there is either a h/w problem, or CPU1
> has gone to sleep after starting to handle an interrupt.
> Thanks,

Does all of the deadlocks stop if you turn off halting when idle by doing 
'sysctl machdep.cpu_idle_hlt=0'?

> Gerrit
>
> P0>dumpAllLocalApic
> CPU 0
> ID:    0x6000000
> TPR:   0x0
> PPR:   0x0
> icr_lo:0xf5

last sent INVLPG

> APR:   0x0
> ISR0:  0x0
> ISR1:  0x0
> ISR2:  0x0
> ISR3:  0x0
> ISR4:  0x0
> ISR5:  0x0
> ISR6:  0x0
> ISR7:  0x0
> IRR0:  0x0
> IRR1:  0x0
> IRR2:  0x0
> IRR3:  0x0
> IRR4:  0x0
> IRR5:  0x0
> IRR6:  0x0
> IRR7:  0x18000000

This actually has 2 pending interrupts that it needs to service, both 252 
(statclock) and 251 (hardclock).

> TMR0:  0x0
> TMR1:  0x0
> TMR2:  0x0
> TMR3:  0x0
> TMR4:  0x0
> TMR5:  0x0
> TMR6:  0x0
> TMR7:  0x0
> CPU 1
> ID:    0x7000000
> TPR:   0x0
> PPR:   0xf0
> icr_lo:0xf3

last sent AST

> APR:   0x0
> ISR0:  0x0
> ISR1:  0x0
> ISR2:  0x0
> ISR3:  0x0
> ISR4:  0x0
> ISR5:  0x0
> ISR6:  0x0
> ISR7:  0x8000000

Currently handling hardclock

> IRR0:  0x0
> IRR1:  0x0
> IRR2:  0x0
> IRR3:  0x0
> IRR4:  0x0
> IRR5:  0x0
> IRR6:  0x0
> IRR7:  0x18200000

This has 3 pending (INVLPG, hardclock, statclock) and is currently servicing 
statclock.  This means some CPU has sent INVLPG (f5) and is spinning with 
interrupts disabled waiting for CPU 1 to ack.  This could be CPU 0.

> TMR0:  0x0
> TMR1:  0x0
> TMR2:  0x0
> TMR3:  0x0
> TMR4:  0x0
> TMR5:  0x0
> TMR6:  0x0
> TMR7:  0x0
> CPU 2
> ID:    0x0
> TPR:   0x0
> PPR:   0x0
> icr_lo:0xfb

last sent hardclock

> APR:   0x0
> ISR0:  0x0
> ISR1:  0x0
> ISR2:  0x0
> ISR3:  0x0
> ISR4:  0x0
> ISR5:  0x0
> ISR6:  0x0
> ISR7:  0x0
> IRR0:  0x0
> IRR1:  0x1000000
> IRR2:  0x0
> IRR3:  0x0
> IRR4:  0x20000
> IRR5:  0x0
> IRR6:  0x0
> IRR7:  0x0
> TMR0:  0x0
> TMR1:  0x0
> TMR2:  0x1000
> TMR3:  0x0
> TMR4:  0x20000
> TMR5:  0x0
> TMR6:  0x0
> TMR7:  0x0

CPU 2 must have interrupts disabled as it has 2 PCI interrupts (IRQs 56 and 
145, must have a lot of I/O APICs in this box!) both which are level 
triggered (hence bits set in TMR).

> CPU 3
> ID:    0x1000000
> TPR:   0x0
> PPR:   0x0
> icr_lo:0xf3

last sent an AST

> APR:   0x0
> ISR0:  0x0
> ISR1:  0x0
> ISR2:  0x0
> ISR3:  0x0
> ISR4:  0x0
> ISR5:  0x0
> ISR6:  0x0
> ISR7:  0x0
> IRR0:  0x0
> IRR1:  0x0
> IRR2:  0x0
> IRR3:  0x0
> IRR4:  0x0
> IRR5:  0x0
> IRR6:  0x0
> IRR7:  0x0
> TMR0:  0x0
> TMR1:  0x0
> TMR2:  0x0
> TMR3:  0x0
> TMR4:  0x0
> TMR5:  0x0
> TMR6:  0x0
> TMR7:  0x0

Nothing pending or currently executing.  Its ok for this one to be halted 
(CPU3), but neither CPU2 nor CPU1 should be halted.  CPU2 claims to be 
executing Xhardclock which does an EOI in < 20 instructions after it starts.  
Does the ISR for CPU 2 clear if you let it continue for a bit?

-- 
John Baldwin <jhb@FreeBSD.org>  <><  http://www.FreeBSD.org/~jhb/
"Power Users Use the Power to Serve"  =  http://www.FreeBSD.org