From owner-freebsd-current@FreeBSD.ORG  Sat Oct  2 06:02:18 2004
Return-Path: <owner-freebsd-current@FreeBSD.ORG>
Delivered-To: freebsd-current@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP
	id 549A016A4CE; Sat,  2 Oct 2004 06:02:18 +0000 (GMT)
Received: from green.homeunix.org (pcp04368961pcs.nrockv01.md.comcast.net
	[69.140.212.7])	by mx1.FreeBSD.org (Postfix) with ESMTP
	id 11F7343D48; Sat,  2 Oct 2004 06:02:16 +0000 (GMT)
	(envelope-from green@green.homeunix.org)
Received: from green.homeunix.org (green@localhost [127.0.0.1])
	by green.homeunix.org (8.13.1/8.13.1) with ESMTP id i92624fX001803;
	Sat, 2 Oct 2004 02:02:04 -0400 (EDT)
	(envelope-from green@green.homeunix.org)
Received: (from green@localhost)
	by green.homeunix.org (8.13.1/8.13.1/Submit) id i92621Wr001802;
	Sat, 2 Oct 2004 02:02:01 -0400 (EDT)
	(envelope-from green)
Date: Sat, 2 Oct 2004 02:02:01 -0400
From: Brian Fundakowski Feldman <green@FreeBSD.org>
To: John Baldwin <jhb@FreeBSD.org>
Message-ID: <20041002060201.GB1034@green.homeunix.org>
References: <20040924230425.GB1164@green.homeunix.org>
	<20040925101021.A78979@bpgate.speednet.com.au>
	<200409271635.44017.jhb@FreeBSD.org>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <200409271635.44017.jhb@FreeBSD.org>
User-Agent: Mutt/1.5.6i
cc: scottl@FreeBSD.org
cc: Andy Farkas <andy@bradfieldprichard.com.au>
cc: freebsd-current@FreeBSD.org
cc: julian@FreeBSD.org
Subject: Re: panic: APIC: Previous IPI is stuck
X-BeenThere: freebsd-current@freebsd.org
X-Mailman-Version: 2.1.1
Precedence: list
List-Id: Discussions about the use of FreeBSD-current
	<freebsd-current.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>,
	<mailto:freebsd-current-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-current>
List-Post: <mailto:freebsd-current@freebsd.org>
List-Help: <mailto:freebsd-current-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>,
	<mailto:freebsd-current-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sat, 02 Oct 2004 06:02:18 -0000

On Mon, Sep 27, 2004 at 04:35:44PM -0400, John Baldwin wrote:
> On Friday 24 September 2004 08:24 pm, Andy Farkas wrote:
> > I have been having this problem for a few weeks now. Glad I'm not the only
> > one. My box is a 4xPPro running 5.3-BETA5. It panics with either ULE
> > or 4BSD.
> >
> > My theory is that a physical IPI gets lost somewhere and the kerenl spins
> > waiting for it. But thats just a stab in the dark because nobody cares to
> > explain why IPI's would be stuck.
> 
> The panic has to do with a previous IPI not finished being sent from the same 
> CPU.  I've yet to determine why this happens.  You can try editing 
> sys/i386/i386/local_apic.c and turning on 'DETECT_DEADLOCK' (I think it is 
> just commented out) and seeing if that improves stability.  I also see this 
> on a 4xPIIXeon system I use for testing.
> 
> > -andyf
> >
> > On Fri, 24 Sep 2004, Brian Fundakowski Feldman wrote:
> > > This is on a 2xAthlon with the SCHED_ULE, HZ=1000, SW_WATCHDOG, and
> > > nothing really special in development.
> > >
> > > FreeBSD green.homeunix.org 6.0-CURRENT FreeBSD 6.0-CURRENT #110: Wed Sep
> > > 22 11:28:27 EDT 2004    
> > > root@green.homeunix.org:/usr/src/sys/i386/compile/GREEN  i386
> > >
> > > panic: APIC: Previous IPI is stuck
> > > cpuid = 1
> > > KDB: stack backtrace:
> > > kdb_backtrace(c063cae7,1,c063c5e7,d4411b28,c1da2000) at
> > > kdb_backtrace+0x2e panic(c063c5e7,1,f3,1,2) at panic+0x128
> > > lapic_ipi_vectored(f3,1,c1da2494,1,c0675910) at 64) at
> > > sched_add_internal+0x21e kseq_assign(c0675910,1,c0625a07,5e0,c1da1540) at
> > > kseq_assign+0x4a sched_clock(c1da2000,2,c0621165,17e,d4411c54) at
> > > sched_clock+0x74 statclock(d4411c54,c1ecc840,d4411c3c,c05edc8b,d4411c54)
> > > at statclock+0xf8 rtcintr(d4411c54,c0487af4,c06733a0,2,8) at rtcintr+0x4f
> > > intr_execute_handlers(c1dca8f0,d4411c54,d4411cb4,c05ea0e3,38) at
> > > intr_execute_ha ndlers+0xab
> > > lapic_handle_intr(38) at lapic_handle_intr+0x3a
> > > Xapic_isr1() at Xapic_isr1+0x33
> > > --- interrupt, eip = 0xc04a640a, esp = 0xd4411c98, ebp = 0xd4411cb4 ---
> > > _mtx_lock_sleep(c06733e0,c1da2000,0,c06220e8,222) at
> > > _mtx_lock_sleep+0x13a _mtx_lock_flags(c06733e0,0,c06220e8,222,0) at
> > > _mtx_lock_flags+0xc0
> > > ithread_loop(c1da6200,d4411d48,c0621edb,31f,c1da6200) at
> > > ithread_loop+0x15a fork_exit(c0499660,c1da6200,d4411d48) at
> > > fork_exit+0xc6
> > > fork_trampoline() at fork_trampoline+0x8
> > > --- trap 0x1, eip = 0, esp = 0xd4411d7c, ebp = 0 ---
> > > KDB: enter: panic
> > > panic: APIC: Previous IPI is stuck
> > > cpuid = 1
> > > boot() called on cpu#1
> > > Uptime: 2d0h16m55s
> > > ^^ full hang instead of reset

Okay, I just got another one of these, exactly the same as that one but
for the fact that the softclock() interrupt was specifically locking
Giant instead of the interrupt thread loop.  So the other CPU owned
Giant at the time and the scheduling CPU is trying to acquire it and
interrupted by needing to run the statclock().

This is way too coincidental to ignore.

SCHED_ULE is far too complex for me to understand much of right now;
what prevents sched_clock() from calling kseq_assign() multiple times
per CPU?  Are we _absolutely_100%_certain_ that functionality works
correctly?

-- 
Brian Fundakowski Feldman                           \'[ FreeBSD ]''''''''''\
  <> green@FreeBSD.org                               \  The Power to Serve! \
 Opinions expressed are my own.                       \,,,,,,,,,,,,,,,,,,,,,,\