Date: Sun, 3 Oct 2004 03:53:03 -0400 From: Brian Fundakowski Feldman <green@FreeBSD.org> To: John Baldwin <jhb@FreeBSD.org> Cc: jeff@FreeBSD.org Subject: Re: panic: APIC: Previous IPI is stuck Message-ID: <20041003075303.GG1034@green.homeunix.org> In-Reply-To: <20041002060201.GB1034@green.homeunix.org> References: <20040924230425.GB1164@green.homeunix.org> <20040925101021.A78979@bpgate.speednet.com.au> <200409271635.44017.jhb@FreeBSD.org> <20041002060201.GB1034@green.homeunix.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On Sat, Oct 02, 2004 at 02:02:01AM -0400, Brian Fundakowski Feldman wrote: > On Mon, Sep 27, 2004 at 04:35:44PM -0400, John Baldwin wrote: > > On Friday 24 September 2004 08:24 pm, Andy Farkas wrote: > > > I have been having this problem for a few weeks now. Glad I'm not the only > > > one. My box is a 4xPPro running 5.3-BETA5. It panics with either ULE > > > or 4BSD. > > > > > > My theory is that a physical IPI gets lost somewhere and the kerenl spins > > > waiting for it. But thats just a stab in the dark because nobody cares to > > > explain why IPI's would be stuck. > > > > The panic has to do with a previous IPI not finished being sent from the same > > CPU. I've yet to determine why this happens. You can try editing > > sys/i386/i386/local_apic.c and turning on 'DETECT_DEADLOCK' (I think it is > > just commented out) and seeing if that improves stability. I also see this > > on a 4xPIIXeon system I use for testing. > > > > > -andyf > > > > > > On Fri, 24 Sep 2004, Brian Fundakowski Feldman wrote: > > > > This is on a 2xAthlon with the SCHED_ULE, HZ=1000, SW_WATCHDOG, and > > > > nothing really special in development. > > > > > > > > FreeBSD green.homeunix.org 6.0-CURRENT FreeBSD 6.0-CURRENT #110: Wed Sep > > > > 22 11:28:27 EDT 2004 > > > > root@green.homeunix.org:/usr/src/sys/i386/compile/GREEN i386 > > > > > > > > panic: APIC: Previous IPI is stuck > > > > cpuid = 1 > > > > KDB: stack backtrace: > > > > kdb_backtrace(c063cae7,1,c063c5e7,d4411b28,c1da2000) at > > > > kdb_backtrace+0x2e panic(c063c5e7,1,f3,1,2) at panic+0x128 > > > > lapic_ipi_vectored(f3,1,c1da2494,1,c0675910) at 64) at > > > > sched_add_internal+0x21e kseq_assign(c0675910,1,c0625a07,5e0,c1da1540) at > > > > kseq_assign+0x4a sched_clock(c1da2000,2,c0621165,17e,d4411c54) at > > > > sched_clock+0x74 statclock(d4411c54,c1ecc840,d4411c3c,c05edc8b,d4411c54) > > > > at statclock+0xf8 rtcintr(d4411c54,c0487af4,c06733a0,2,8) at rtcintr+0x4f > > > > intr_execute_handlers(c1dca8f0,d4411c54,d4411cb4,c05ea0e3,38) at > > > > intr_execute_ha ndlers+0xab > > > > lapic_handle_intr(38) at lapic_handle_intr+0x3a > > > > Xapic_isr1() at Xapic_isr1+0x33 > > > > --- interrupt, eip = 0xc04a640a, esp = 0xd4411c98, ebp = 0xd4411cb4 --- > > > > _mtx_lock_sleep(c06733e0,c1da2000,0,c06220e8,222) at > > > > _mtx_lock_sleep+0x13a _mtx_lock_flags(c06733e0,0,c06220e8,222,0) at > > > > _mtx_lock_flags+0xc0 > > > > ithread_loop(c1da6200,d4411d48,c0621edb,31f,c1da6200) at > > > > ithread_loop+0x15a fork_exit(c0499660,c1da6200,d4411d48) at > > > > fork_exit+0xc6 > > > > fork_trampoline() at fork_trampoline+0x8 > > > > --- trap 0x1, eip = 0, esp = 0xd4411d7c, ebp = 0 --- > > > > KDB: enter: panic > > > > panic: APIC: Previous IPI is stuck > > > > cpuid = 1 > > > > boot() called on cpu#1 > > > > Uptime: 2d0h16m55s > > > > ^^ full hang instead of reset > > Okay, I just got another one of these, exactly the same as that one but > for the fact that the softclock() interrupt was specifically locking > Giant instead of the interrupt thread loop. So the other CPU owned > Giant at the time and the scheduling CPU is trying to acquire it and > interrupted by needing to run the statclock(). > > This is way too coincidental to ignore. > > SCHED_ULE is far too complex for me to understand much of right now; > what prevents sched_clock() from calling kseq_assign() multiple times > per CPU? Are we _absolutely_100%_certain_ that functionality works > correctly? Ping... adding Jeff... I really wish I understood SCHED_ULE, because it seems entirely plausible it's trying to send two IPIs, the first of which would get blocked waiting for the held sched_lock, and the second of which would never have its interrupt serviced because the first one blocked on sched_lock would have interrupts disabled and would remain unable to respond to an IPI... -- Brian Fundakowski Feldman \'[ FreeBSD ]''''''''''\ <> green@FreeBSD.org \ The Power to Serve! \ Opinions expressed are my own. \,,,,,,,,,,,,,,,,,,,,,,\
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20041003075303.GG1034>