From owner-freebsd-sparc64@FreeBSD.ORG Thu Feb 23 20:47:17 2006 Return-Path: X-Original-To: sparc64@FreeBSD.org Delivered-To: freebsd-sparc64@FreeBSD.ORG Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 9CA9416A420; Thu, 23 Feb 2006 20:47:17 +0000 (GMT) (envelope-from kris@obsecurity.org) Received: from elvis.mu.org (elvis.mu.org [192.203.228.196]) by mx1.FreeBSD.org (Postfix) with ESMTP id 55F2143D48; Thu, 23 Feb 2006 20:47:17 +0000 (GMT) (envelope-from kris@obsecurity.org) Received: from obsecurity.dyndns.org (elvis.mu.org [192.203.228.196]) by elvis.mu.org (Postfix) with ESMTP id 38B111A3C1F; Thu, 23 Feb 2006 12:47:17 -0800 (PST) Received: by obsecurity.dyndns.org (Postfix, from userid 1000) id 72DBA51472; Thu, 23 Feb 2006 15:47:16 -0500 (EST) Date: Thu, 23 Feb 2006 15:47:16 -0500 From: Kris Kennaway To: sparc64@FreeBSD.org Message-ID: <20060223204716.GA90985@xor.obsecurity.org> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="PEIAKu/WMn1b1Hv9" Content-Disposition: inline User-Agent: Mutt/1.4.2.1i Cc: jhb@FreeBSD.org Subject: "sched_lock held too long" panic + trace X-BeenThere: freebsd-sparc64@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Porting FreeBSD to the Sparc List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 23 Feb 2006 20:47:17 -0000 --PEIAKu/WMn1b1Hv9 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline One of my e4500s has started panicking regularly under load because sched_lock was held for > 5 seconds. Since on sparc64 it always deadlocks after this panic instead of entering DDB, I wasn't able to track down the cause. Instead, I changed the panic to first DELAY(1000000*PCPU_GET(cpuid)) (so that different CPUs don't overlap the printfs) and then kdb_backtrace(). Doing so I obtained the following trace (still a bit corrupted, but hopefully more useful). spspilolock hchedolockehdlb by 0xfffff2b2be951500ofor 5 s cecdn spin ponk oohkd hhdd lcc eel0 yy fxfffbf921be01f50 f rr>ec es nDs stack backtrace: statclock() at statclock+0x6c tick_hardclock() at tick_hardclock+0x100 -- interrupt level=0xe pil=0 %o7=0xc017fb08 -- sched_runnable() at sched_runnable+spi8 fcrkscxid()oat ferk ex 0+0f94f802bk_bram0olone>) t forkstrampoline+0x8 panic: spin lock held too long cpuid = 0 KDB: enter: panic KDB: stack backtrace: cpu+0x6c kgkmc uo)ca tick_hardclock() at tick_hardclock+0xc4 -- interrupt level=0xe pil=0 %o7=0xc0190a98 -- _mtx_lock_spin() at _mtx_lock_spin+0xf4 idle_proc() at idle_proc+0x16c fork_exit() at fork_exit+0x94 fork_trampoline() at fork_trampoline+0x8 KDB: stack backtrace: hardclock_cpu() at hardclock_cpu+0x6c tick_hardclock() at tick_hardclock+0xc4 -- interrupt level=0xe pil=0 %o7=0xc0190a98 -- _mtx_lock_spin() at _mtx_lock_spin+0xf4 idle_proc() at idle_proc+0x16c fork_exit() at fork_exit+0x94 fork_trampoline() at fork_trampoline+0x8 KDB: stack backtrace: hardclock_cpu() at hardclock_cpu+0x6c tick_hardclock() at tick_hardclock+0xc4 -- interrupt level=0xe pil=0 %o7=0xc0190a98 -- _mtx_lock_spin() at _mtx_lock_spin+0xf4 idle_proc() at idle_proc+0x16c fork_exit() at fork_exit+0x94 KDB: stack backtrace: hardclock_cpu() at hardclock_cpu+0x6c tick_hardclock() at tick_hardclock+0xc4 -- interrupt level=0xe pil=0 %o7=0xc01b5c84 -- runq_check() at runq_check+0x24 idle_proc() at idle_proc+0x108 fork_exit() at fork_exit+0x94 fork_trampoline() at fork_trampoline+0x8 KDB: stack backtrace: hardclock_cpu() at hardclock_cpu+0x6c tick_hardclock() at tick_hardclock+0xc4 -- interrupt level=0xe pil=0 %o7=0xc01b5c84 -- runq_check() at runq_check+0x2c idle_proc() at idle_proc+0x108 fork_exit() at fork_exit+0x94 fork_trampoline() at fork_trampoline+0x8 KDB: stack backtrace: hardclock_cpu() at hardclock_cpu+0x6c tick_hardclock() at tick_hardclock+0xc4 -- interrupt level=0xe pil=0 %o7=0xc0190a98 -- _mtx_lock_spin() at _mtx_lock_spin+0xf4 tlb_page_demap() at tlb_page_demap+0xa0 pmap_zero_page_idle() at pmap_zero_page_idle+0xdc vm_page_zero_idle() at vm_page_zero_idle+0x108 vm_pagezero() at vm_pagezero+0x4c fork_exit() at fork_exit+0x94 fork_trampoline() at fork_trampoline+0x8 Does this s[c]hed any light on the cause? Kris --PEIAKu/WMn1b1Hv9 Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2 (FreeBSD) iD8DBQFD/h9TWry0BWjoQKURArxhAKCnMua6P8Spb4cTkLDESoiCsq6DPgCg4i/r +Dt/NEYDxNk62AYCel9JINc= =ulhm -----END PGP SIGNATURE----- --PEIAKu/WMn1b1Hv9--