From owner-freebsd-bugs Sun Feb 9 14:10: 9 2003 Delivered-To: freebsd-bugs@hub.freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id AC7E237B401 for ; Sun, 9 Feb 2003 14:10:06 -0800 (PST) Received: from freefall.freebsd.org (freefall.freebsd.org [216.136.204.21]) by mx1.FreeBSD.org (Postfix) with ESMTP id 6CDE143FAF for ; Sun, 9 Feb 2003 14:10:05 -0800 (PST) (envelope-from gnats@FreeBSD.org) Received: from freefall.freebsd.org (gnats@localhost [127.0.0.1]) by freefall.freebsd.org (8.12.6/8.12.6) with ESMTP id h19MA5NS044860 for ; Sun, 9 Feb 2003 14:10:05 -0800 (PST) (envelope-from gnats@freefall.freebsd.org) Received: (from gnats@localhost) by freefall.freebsd.org (8.12.6/8.12.6/Submit) id h19MA5Jh044859; Sun, 9 Feb 2003 14:10:05 -0800 (PST) Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 89C4F37B401; Sun, 9 Feb 2003 14:06:41 -0800 (PST) Received: from eagle.sharma-home.net (cpe-66-1-147-119.ca.sprintbbd.net [66.1.147.119]) by mx1.FreeBSD.org (Postfix) with ESMTP id 721E543F85; Sun, 9 Feb 2003 14:06:40 -0800 (PST) (envelope-from adsharma@sharma-home.net) Received: from astra.mirabella.net (astra.mirabella.net [192.168.1.3]) by eagle.sharma-home.net (Postfix) with ESMTP id B632980D8; Sun, 9 Feb 2003 14:11:42 -0800 (PST) Received: by astra.mirabella.net (Postfix, from userid 1001) id A55C02E; Sun, 9 Feb 2003 14:06:39 -0800 (PST) Message-Id: <20030209220639.A55C02E@astra.mirabella.net> Date: Sun, 9 Feb 2003 14:06:39 -0800 (PST) From: Arun Sharma Reply-To: Arun Sharma To: FreeBSD-gnats-submit@FreeBSD.org Cc: smp@FreeBSD.org X-Send-Pr-Version: 3.113 Subject: kern/48117: SMP machine hang during boot related to idle proc and sched_lock Sender: owner-freebsd-bugs@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.org >Number: 48117 >Category: kern >Synopsis: SMP machine hang during boot related to idle proc and sched_lock >Confidential: no >Severity: serious >Priority: medium >Responsible: freebsd-bugs >State: open >Quarter: >Keywords: >Date-Required: >Class: sw-bug >Submitter-Id: current-users >Arrival-Date: Sun Feb 09 14:10:04 PST 2003 >Closed-Date: >Last-Modified: >Originator: Arun Sharma >Release: FreeBSD 5.0-CURRENT i386 >Organization: >Environment: System: FreeBSD astra.mirabella.net 5.0-CURRENT FreeBSD 5.0-CURRENT #16: Sat Feb 8 09:08:58 PST 2003 root@astra.mirabella.net:/usr/src/sys/i386/compile/astra i386 >Description: The machine hangs randomly during bootup on a 2 way SMP box. In some of those hangs, it gets into ddb and I could collect the following info: db> show pcpu cpuid = 0 curthread = 0xc0d19380: pid 46 "sh" curpcb = 0xcad54da0 fpcurthread = none idlethread = 0xc0d18b60: pid 12 "idle: cpu0" currentldt = 0x28 db> tr Debugger(c0364696,0,c036423d,cad54a64,1) at Debugger+0x55 panic(c036423d,c036426b,c0d18a80,0,cad54af8) at panic+0x11f _mtx_lock_spin(c038b6c0,2,0,0,c1fc4dc8) at _mtx_lock_spin+0x93 hardclock_process(cad54af8,0,c02f682b,20,0) at hardclock_process+0x76 hardclock(cad54af8,c0cf239c,c0334d57,c0829000,c1fc8b28) at hardclock+0x18 clkintr(0) at clkintr+0xec Xfastintr0() at Xfastintr0+0xba --- interrupt, eip = 0xc01cc580, esp = 0xcad54b3c, ebp = 0xcad54b58 --- _mtx_lock_spin(c038b6c0,0,0,0,0) at _mtx_lock_spin+0x50 vm_fault(c0d1f114,80f8000,2,8,c0d19380) at vm_fault+0x1379 trap_pfault(cad54d48,1,80f8a78,202,80f8a78) at trap_pfault+0x125 trap(2f,2f,2f,2f,80fc000) at trap+0x2a3 calltrap() at calltrap+0x5 --- trap 0xc, eip = 0x8052653, esp = 0xbfbff304, ebp = 0xbfbff308 --- db> show pcpu 1 cpuid = 1 curthread = 0xc0d18a80: pid 11 "idle: cpu1" curpcb = 0xcad36da0 fpcurthread = none idlethread = 0xc0d18a80: pid 11 "idle: cpu1" currentldt = 0x28 db > show msgbuf [...] panic: spin lock sched lock held by 0xc0d18a80 for > 5 seconds cpuid = 0; lapic.id = 00000000 The only piece not captured above is the stack of the idle process - which was in mi_switch(). Invariants and witness code were not configured-in. >How-To-Repeat: Boot the SMP kernel repeatedly. >Fix: Not clear. Need to figure out why the idle proc (cpu1) was sitting in mi_switch() for more than 5 secs. >Release-Note: >Audit-Trail: >Unformatted: To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-bugs" in the body of the message