From owner-freebsd-current@FreeBSD.ORG Sat Oct 2 06:02:18 2004 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 549A016A4CE; Sat, 2 Oct 2004 06:02:18 +0000 (GMT) Received: from green.homeunix.org (pcp04368961pcs.nrockv01.md.comcast.net [69.140.212.7]) by mx1.FreeBSD.org (Postfix) with ESMTP id 11F7343D48; Sat, 2 Oct 2004 06:02:16 +0000 (GMT) (envelope-from green@green.homeunix.org) Received: from green.homeunix.org (green@localhost [127.0.0.1]) by green.homeunix.org (8.13.1/8.13.1) with ESMTP id i92624fX001803; Sat, 2 Oct 2004 02:02:04 -0400 (EDT) (envelope-from green@green.homeunix.org) Received: (from green@localhost) by green.homeunix.org (8.13.1/8.13.1/Submit) id i92621Wr001802; Sat, 2 Oct 2004 02:02:01 -0400 (EDT) (envelope-from green) Date: Sat, 2 Oct 2004 02:02:01 -0400 From: Brian Fundakowski Feldman To: John Baldwin Message-ID: <20041002060201.GB1034@green.homeunix.org> References: <20040924230425.GB1164@green.homeunix.org> <20040925101021.A78979@bpgate.speednet.com.au> <200409271635.44017.jhb@FreeBSD.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <200409271635.44017.jhb@FreeBSD.org> User-Agent: Mutt/1.5.6i cc: scottl@FreeBSD.org cc: Andy Farkas cc: freebsd-current@FreeBSD.org cc: julian@FreeBSD.org Subject: Re: panic: APIC: Previous IPI is stuck X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 02 Oct 2004 06:02:18 -0000 On Mon, Sep 27, 2004 at 04:35:44PM -0400, John Baldwin wrote: > On Friday 24 September 2004 08:24 pm, Andy Farkas wrote: > > I have been having this problem for a few weeks now. Glad I'm not the only > > one. My box is a 4xPPro running 5.3-BETA5. It panics with either ULE > > or 4BSD. > > > > My theory is that a physical IPI gets lost somewhere and the kerenl spins > > waiting for it. But thats just a stab in the dark because nobody cares to > > explain why IPI's would be stuck. > > The panic has to do with a previous IPI not finished being sent from the same > CPU. I've yet to determine why this happens. You can try editing > sys/i386/i386/local_apic.c and turning on 'DETECT_DEADLOCK' (I think it is > just commented out) and seeing if that improves stability. I also see this > on a 4xPIIXeon system I use for testing. > > > -andyf > > > > On Fri, 24 Sep 2004, Brian Fundakowski Feldman wrote: > > > This is on a 2xAthlon with the SCHED_ULE, HZ=1000, SW_WATCHDOG, and > > > nothing really special in development. > > > > > > FreeBSD green.homeunix.org 6.0-CURRENT FreeBSD 6.0-CURRENT #110: Wed Sep > > > 22 11:28:27 EDT 2004 > > > root@green.homeunix.org:/usr/src/sys/i386/compile/GREEN i386 > > > > > > panic: APIC: Previous IPI is stuck > > > cpuid = 1 > > > KDB: stack backtrace: > > > kdb_backtrace(c063cae7,1,c063c5e7,d4411b28,c1da2000) at > > > kdb_backtrace+0x2e panic(c063c5e7,1,f3,1,2) at panic+0x128 > > > lapic_ipi_vectored(f3,1,c1da2494,1,c0675910) at 64) at > > > sched_add_internal+0x21e kseq_assign(c0675910,1,c0625a07,5e0,c1da1540) at > > > kseq_assign+0x4a sched_clock(c1da2000,2,c0621165,17e,d4411c54) at > > > sched_clock+0x74 statclock(d4411c54,c1ecc840,d4411c3c,c05edc8b,d4411c54) > > > at statclock+0xf8 rtcintr(d4411c54,c0487af4,c06733a0,2,8) at rtcintr+0x4f > > > intr_execute_handlers(c1dca8f0,d4411c54,d4411cb4,c05ea0e3,38) at > > > intr_execute_ha ndlers+0xab > > > lapic_handle_intr(38) at lapic_handle_intr+0x3a > > > Xapic_isr1() at Xapic_isr1+0x33 > > > --- interrupt, eip = 0xc04a640a, esp = 0xd4411c98, ebp = 0xd4411cb4 --- > > > _mtx_lock_sleep(c06733e0,c1da2000,0,c06220e8,222) at > > > _mtx_lock_sleep+0x13a _mtx_lock_flags(c06733e0,0,c06220e8,222,0) at > > > _mtx_lock_flags+0xc0 > > > ithread_loop(c1da6200,d4411d48,c0621edb,31f,c1da6200) at > > > ithread_loop+0x15a fork_exit(c0499660,c1da6200,d4411d48) at > > > fork_exit+0xc6 > > > fork_trampoline() at fork_trampoline+0x8 > > > --- trap 0x1, eip = 0, esp = 0xd4411d7c, ebp = 0 --- > > > KDB: enter: panic > > > panic: APIC: Previous IPI is stuck > > > cpuid = 1 > > > boot() called on cpu#1 > > > Uptime: 2d0h16m55s > > > ^^ full hang instead of reset Okay, I just got another one of these, exactly the same as that one but for the fact that the softclock() interrupt was specifically locking Giant instead of the interrupt thread loop. So the other CPU owned Giant at the time and the scheduling CPU is trying to acquire it and interrupted by needing to run the statclock(). This is way too coincidental to ignore. SCHED_ULE is far too complex for me to understand much of right now; what prevents sched_clock() from calling kseq_assign() multiple times per CPU? Are we _absolutely_100%_certain_ that functionality works correctly? -- Brian Fundakowski Feldman \'[ FreeBSD ]''''''''''\ <> green@FreeBSD.org \ The Power to Serve! \ Opinions expressed are my own. \,,,,,,,,,,,,,,,,,,,,,,\