From owner-freebsd-smp Sun Feb 9 14: 6:43 2003 Delivered-To: freebsd-smp@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 89C4F37B401; Sun, 9 Feb 2003 14:06:41 -0800 (PST) Received: from eagle.sharma-home.net (cpe-66-1-147-119.ca.sprintbbd.net [66.1.147.119]) by mx1.FreeBSD.org (Postfix) with ESMTP id 721E543F85; Sun, 9 Feb 2003 14:06:40 -0800 (PST) (envelope-from adsharma@sharma-home.net) Received: from astra.mirabella.net (astra.mirabella.net [192.168.1.3]) by eagle.sharma-home.net (Postfix) with ESMTP id B632980D8; Sun, 9 Feb 2003 14:11:42 -0800 (PST) Received: by astra.mirabella.net (Postfix, from userid 1001) id A55C02E; Sun, 9 Feb 2003 14:06:39 -0800 (PST) To: FreeBSD-gnats-submit@freebsd.org Subject: SMP machine hang during boot related to idle proc and sched_lock From: Arun Sharma Reply-To: Arun Sharma Cc: smp@freebsd.org X-send-pr-version: 3.113 X-GNATS-Notify: Message-Id: <20030209220639.A55C02E@astra.mirabella.net> Date: Sun, 9 Feb 2003 14:06:39 -0800 (PST) Sender: owner-freebsd-smp@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.org >Submitter-Id: current-users >Originator: Arun Sharma >Organization: >Confidential: no >Synopsis: SMP machine hang during boot related to idle proc and sched_lock >Severity: serious >Priority: medium >Category: kern >Class: sw-bug >Release: FreeBSD 5.0-CURRENT i386 >Environment: System: FreeBSD astra.mirabella.net 5.0-CURRENT FreeBSD 5.0-CURRENT #16: Sat Feb 8 09:08:58 PST 2003 root@astra.mirabella.net:/usr/src/sys/i386/compile/astra i386 >Description: The machine hangs randomly during bootup on a 2 way SMP box. In some of those hangs, it gets into ddb and I could collect the following info: db> show pcpu cpuid = 0 curthread = 0xc0d19380: pid 46 "sh" curpcb = 0xcad54da0 fpcurthread = none idlethread = 0xc0d18b60: pid 12 "idle: cpu0" currentldt = 0x28 db> tr Debugger(c0364696,0,c036423d,cad54a64,1) at Debugger+0x55 panic(c036423d,c036426b,c0d18a80,0,cad54af8) at panic+0x11f _mtx_lock_spin(c038b6c0,2,0,0,c1fc4dc8) at _mtx_lock_spin+0x93 hardclock_process(cad54af8,0,c02f682b,20,0) at hardclock_process+0x76 hardclock(cad54af8,c0cf239c,c0334d57,c0829000,c1fc8b28) at hardclock+0x18 clkintr(0) at clkintr+0xec Xfastintr0() at Xfastintr0+0xba --- interrupt, eip = 0xc01cc580, esp = 0xcad54b3c, ebp = 0xcad54b58 --- _mtx_lock_spin(c038b6c0,0,0,0,0) at _mtx_lock_spin+0x50 vm_fault(c0d1f114,80f8000,2,8,c0d19380) at vm_fault+0x1379 trap_pfault(cad54d48,1,80f8a78,202,80f8a78) at trap_pfault+0x125 trap(2f,2f,2f,2f,80fc000) at trap+0x2a3 calltrap() at calltrap+0x5 --- trap 0xc, eip = 0x8052653, esp = 0xbfbff304, ebp = 0xbfbff308 --- db> show pcpu 1 cpuid = 1 curthread = 0xc0d18a80: pid 11 "idle: cpu1" curpcb = 0xcad36da0 fpcurthread = none idlethread = 0xc0d18a80: pid 11 "idle: cpu1" currentldt = 0x28 db > show msgbuf [...] panic: spin lock sched lock held by 0xc0d18a80 for > 5 seconds cpuid = 0; lapic.id = 00000000 The only piece not captured above is the stack of the idle process - which was in mi_switch(). Invariants and witness code were not configured-in. >How-To-Repeat: Boot the SMP kernel repeatedly. >Fix: Not clear. Need to figure out why the idle proc (cpu1) was sitting in mi_switch() for more than 5 secs. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-smp" in the body of the message