From owner-freebsd-current@FreeBSD.ORG Wed Mar 8 20:11:24 2006 Return-Path: X-Original-To: freebsd-current@freebsd.org Delivered-To: freebsd-current@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 3F39416A420; Wed, 8 Mar 2006 20:11:24 +0000 (GMT) (envelope-from jhb@freebsd.org) Received: from server.baldwin.cx (66-23-211-162.clients.speedfactory.net [66.23.211.162]) by mx1.FreeBSD.org (Postfix) with ESMTP id 7D7D343D45; Wed, 8 Mar 2006 20:11:23 +0000 (GMT) (envelope-from jhb@freebsd.org) Received: from localhost (john@localhost [127.0.0.1]) by server.baldwin.cx (8.13.4/8.13.4) with ESMTP id k28KBIue037578; Wed, 8 Mar 2006 15:11:21 -0500 (EST) (envelope-from jhb@freebsd.org) From: John Baldwin To: freebsd-current@freebsd.org Date: Wed, 8 Mar 2006 15:04:33 -0500 User-Agent: KMail/1.9.1 References: <20060308195906.GA51429@xor.obsecurity.org> In-Reply-To: <20060308195906.GA51429@xor.obsecurity.org> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200603081504.35845.jhb@freebsd.org> X-Virus-Scanned: ClamAV 0.87.1/1318/Tue Mar 7 15:55:18 2006 on server.baldwin.cx X-Virus-Status: Clean X-Spam-Status: No, score=-3.0 required=4.2 tests=ALL_TRUSTED,AWL,BAYES_00 autolearn=ham version=3.1.0 X-Spam-Checker-Version: SpamAssassin 3.1.0 (2005-09-13) on server.baldwin.cx Cc: current@freebsd.org, Kris Kennaway Subject: Re: "spin lock sched lock held by 0xc63b7870 for > 5 seconds" at reboot X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 08 Mar 2006 20:11:24 -0000 On Wednesday 08 March 2006 14:59, Kris Kennaway wrote: > i386 SMP server, up-to-date current: Yes, I know about this one and had sent a workaround to someone I thought. I think the real fix is that we need to disable interrupts in cpu_reset() (perhaps earlier, our whole SMP shutdown sequence needs thought I think, i.e. I think we need to IPI all the CPUs during a non-panic shutdown to ask them to go idle and block until that happens and then disable interrupts and finish the shutdown). > Uptime: 1d22h27m15s > Rebooting... > cpu_reset: Stopping other CPUs > spin lock sched lock held by 0xc63b7870 for > 5 seconds > panic: spin lock held too long > cpuid = 0 > KDB: enter: panic > [thread pid 99832 tid 100162 ] > Stopped at kdb_enter+0x30: leave > db> wh > Tracing pid 99832 tid 100162 td 0xc84d5360 > kdb_enter(c072025f,0,c071f552,ef2dab08,c84d5360) at kdb_enter+0x30 > panic(c071f552,c63b7870,c63b7870,c84d5360,c0786c30) at panic+0x13f > _mtx_lock_spin(c0786c30,c84d5360,2,c071bf1d,cb) at _mtx_lock_spin+0xbd > _mtx_lock_spin_flags(c0786c30,2,c071bf1d,cb,0) at _mtx_lock_spin_flags+0xe7 > hardclock_cpu(0,c0786c30,2,c0721c41,c07fcd40) at hardclock_cpu+0x3c > hardclock(0,c06e0b97,0,ef2dac18,c06c81ae) at hardclock+0x18 > lapic_handle_timer(8,28,28,f4240,0) at lapic_handle_timer+0x7e > Xtimerint() at Xtimerint+0x1e > --- interrupt, eip = 0xc06e0b97, esp = 0xef2dabe4, ebp = 0xef2dac18 --- > DELAY(f4240,c63b0220,ef2dac34,c052f63d,f4240) at DELAY+0x9f > cpu_reset(f4240,ef2dac64,c052f4fa,0,0) at cpu_reset+0xce > shutdown_reset(0,0,c0720283,195,0) at shutdown_reset+0x23 > boot(0,0,c0720283,a2,ef2dad30) at boot+0x65a > reboot(c84d5360,ef2dad04,4,445,c6926420) at reboot+0x47 > syscall(3b,3b,3b,0,0) at syscall+0x2ea > Xint0x80_syscall() at Xint0x80_syscall+0x1f > --- syscall (55, FreeBSD ELF32, reboot), eip = 0x280a9f5b, esp = 0xbfbfe80c, ebp = 0xbfbfe858 --- > db> show locks > exclusive sleep mutex Giant r = 0 (0xc0786c48) locked @ kern/vfs_syscalls.c:126 > db> show alllocks > Process 99832 (reboot) thread 0xc84d5360 (100162) > exclusive sleep mutex Giant r = 0 (0xc0786c48) locked @ kern/vfs_syscalls.c:126 > db> show allpcpu > Current CPU: 0 > > cpuid = 0 > curthread = 0xc84d5360: pid 99832 "reboot" > curpcb = 0xef2dad90 > fpcurthread = none > idlethread = 0xc63b7a20: pid 13 "idle: cpu0" > APIC ID = 0 > currentldt = 0x50 > spin locks held: > > cpuid = 1 > curthread = 0xc63b7870: pid 12 "idle: cpu1" > curpcb = 0xea3f5d90 > fpcurthread = none > idlethread = 0xc63b7870: pid 12 "idle: cpu1" > APIC ID = 1 > currentldt = 0x50 > spin locks held: > > ^---- Note: allegedly no spin locks held, contrary to panic > > cpuid = 2 > curthread = 0xc63b76c0: pid 11 "idle: cpu2" > curpcb = 0xea3f2d90 > fpcurthread = none > idlethread = 0xc63b76c0: pid 11 "idle: cpu2" > APIC ID = 2 > currentldt = 0x50 > spin locks held: > > cpuid = 3 > curthread = 0xc63b7510: pid 10 "idle: cpu3" > curpcb = 0xea3efd90 > fpcurthread = none > idlethread = 0xc63b7510: pid 10 "idle: cpu3" > APIC ID = 3 > currentldt = 0x50 > spin locks held: > > db> wh 12 > Tracing pid 12 tid 100004 td 0xc63b7870 > cpustop_handler(ea3f5b90,c06dcd98,1,c,c63b7870) at cpustop_handler+0x2c > ipi_nmi_handler(1,c,c63b7870,ea3f5b28,ea3f5b28) at ipi_nmi_handler+0x29 > trap(c0520008,c63b0028,ea3f0028,6a0,c078cc58) at trap+0x3c > calltrap() at calltrap+0x5 > --- trap 0x13, eip = 0xc05254bf, esp = 0xea3f5bd8, ebp = 0xea3f5bf8 --- > _mtx_lock_spin_flags(c078cc58,0,c07242fa,6a0,c0786c30) at _mtx_lock_spin_flags+0x47 > witness_lock_list_get(0,c0786c30,c0796458,c63b7870,c0786c30) at witness_lock_list_get+0x3d > witness_lock(c0786c30,a,c071bf26,cb,c0721c41) at witness_lock+0xcd > _mtx_lock_spin_flags(c0786c30,2,c071bf1d,cb,c07fcd6c) at _mtx_lock_spin_flags+0x104 > hardclock_cpu(0,ea3f5c9c,2,ea3f5cdc,c06c81ae) at hardclock_cpu+0x3c > lapic_handle_timer(8,ea3f0028,c0540028,0,fffffffd) at lapic_handle_timer+0xa3 > Xtimerint() at Xtimerint+0x1e > --- interrupt, eip = 0xc06d507f, esp = 0xea3f5cdc, ebp = 0xea3f5cdc --- > mp_grab_cpu_hlt(ea3f5d04,c0516ac2,c0786c48,2,c071d59e) at mp_grab_cpu_hlt+0x16 > cpu_idle(c0786c48,2,c071d59e,70,c63b68d0) at cpu_idle+0x8 > idle_proc(0,ea3f5d38,c071d434,31a,0) at idle_proc+0x92 > fork_exit(c0516a30,0,ea3f5d38) at fork_exit+0xc5 > fork_trampoline() at fork_trampoline+0x8 > --- trap 0x1, eip = 0, esp = 0xea3f5d6c, ebp = 0 --- > db> wh 11 > Tracing pid 11 tid 100005 td 0xc63b76c0 > cpustop_handler(ea3f2c8c,c06dcd98,c055c1c7,c07d2c68,0) at cpustop_handler+0x2c > ipi_nmi_handler(c055c1c7,c07d2c68,0,c0786c30,c0796458) at ipi_nmi_handler+0x29 > trap(ea3f0008,c06c0028,c06c0028,0,fffffffb) at trap+0x3c > calltrap() at calltrap+0x5 > --- trap 0x13, eip = 0xc0546bb8, esp = 0xea3f2cd4, ebp = 0xea3f2cd4 --- > runq_check(c0788640,4,ea3f2d04,c0516ac7,c0786c48) at runq_check+0xf > sched_runnable(c0786c48,2,c071d59e,70,c63b6b04) at sched_runnable+0x13 > idle_proc(0,ea3f2d38,c071d434,31a,0) at idle_proc+0x97 > fork_exit(c0516a30,0,ea3f2d38) at fork_exit+0xc5 > fork_trampoline() at fork_trampoline+0x8 > --- trap 0x1, eip = 0, esp = 0xea3f2d6c, ebp = 0 --- > db> wh 10 > Tracing pid 10 tid 100006 td 0xc63b7510 > cpustop_handler(ea3efc94,c06dcd98,c055c1c7,c07d2c68,0) at cpustop_handler+0x2c > ipi_nmi_handler(c055c1c7,c07d2c68,0,c0786c30,c0796458) at ipi_nmi_handler+0x29 > trap(8,ea3e0028,c0540028,0,fffffff7) at trap+0x3c > calltrap() at calltrap+0x5 > --- trap 0x13, eip = 0xc06d507f, esp = 0xea3efcdc, ebp = 0xea3efcdc --- > mp_grab_cpu_hlt(ea3efd04,c0516ac2,c0786c48,2,c071d59e) at mp_grab_cpu_hlt+0x16 > cpu_idle(c0786c48,2,c071d59e,70,c63b6d38) at cpu_idle+0x8 > idle_proc(0,ea3efd38,c071d434,31a,0) at idle_proc+0x92 > fork_exit(c0516a30,0,ea3efd38) at fork_exit+0xc5 > fork_trampoline() at fork_trampoline+0x8 > --- trap 0x1, eip = 0, esp = 0xea3efd6c, ebp = 0 --- > db> -- John Baldwin <>< http://www.FreeBSD.org/~jhb/ "Power Users Use the Power to Serve" = http://www.FreeBSD.org