Date: Thu, 23 Feb 2006 15:47:16 -0500 From: Kris Kennaway <kris@obsecurity.org> To: sparc64@FreeBSD.org Cc: jhb@FreeBSD.org Subject: "sched_lock held too long" panic + trace Message-ID: <20060223204716.GA90985@xor.obsecurity.org>
next in thread | raw e-mail | index | archive | help
--PEIAKu/WMn1b1Hv9 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline One of my e4500s has started panicking regularly under load because sched_lock was held for > 5 seconds. Since on sparc64 it always deadlocks after this panic instead of entering DDB, I wasn't able to track down the cause. Instead, I changed the panic to first DELAY(1000000*PCPU_GET(cpuid)) (so that different CPUs don't overlap the printfs) and then kdb_backtrace(). Doing so I obtained the following trace (still a bit corrupted, but hopefully more useful). spspilolock hchedolockehdlb by 0xfffff2b2be951500ofor 5 s cecdn spin ponk oohkd hhdd lcc eel0 yy fxfffbf921be01f50 f rr>ec es nDs stack backtrace: statclock() at statclock+0x6c tick_hardclock() at tick_hardclock+0x100 -- interrupt level=0xe pil=0 %o7=0xc017fb08 -- sched_runnable() at sched_runnable+spi8 fcrkscxid()oat ferk ex 0+0f94f802bk_bram0olone>) t forkstrampoline+0x8 panic: spin lock held too long cpuid = 0 KDB: enter: panic KDB: stack backtrace: cpu+0x6c kgkmc uo)ca tick_hardclock() at tick_hardclock+0xc4 -- interrupt level=0xe pil=0 %o7=0xc0190a98 -- _mtx_lock_spin() at _mtx_lock_spin+0xf4 idle_proc() at idle_proc+0x16c fork_exit() at fork_exit+0x94 fork_trampoline() at fork_trampoline+0x8 KDB: stack backtrace: hardclock_cpu() at hardclock_cpu+0x6c tick_hardclock() at tick_hardclock+0xc4 -- interrupt level=0xe pil=0 %o7=0xc0190a98 -- _mtx_lock_spin() at _mtx_lock_spin+0xf4 idle_proc() at idle_proc+0x16c fork_exit() at fork_exit+0x94 fork_trampoline() at fork_trampoline+0x8 KDB: stack backtrace: hardclock_cpu() at hardclock_cpu+0x6c tick_hardclock() at tick_hardclock+0xc4 -- interrupt level=0xe pil=0 %o7=0xc0190a98 -- _mtx_lock_spin() at _mtx_lock_spin+0xf4 idle_proc() at idle_proc+0x16c fork_exit() at fork_exit+0x94 KDB: stack backtrace: hardclock_cpu() at hardclock_cpu+0x6c tick_hardclock() at tick_hardclock+0xc4 -- interrupt level=0xe pil=0 %o7=0xc01b5c84 -- runq_check() at runq_check+0x24 idle_proc() at idle_proc+0x108 fork_exit() at fork_exit+0x94 fork_trampoline() at fork_trampoline+0x8 KDB: stack backtrace: hardclock_cpu() at hardclock_cpu+0x6c tick_hardclock() at tick_hardclock+0xc4 -- interrupt level=0xe pil=0 %o7=0xc01b5c84 -- runq_check() at runq_check+0x2c idle_proc() at idle_proc+0x108 fork_exit() at fork_exit+0x94 fork_trampoline() at fork_trampoline+0x8 KDB: stack backtrace: hardclock_cpu() at hardclock_cpu+0x6c tick_hardclock() at tick_hardclock+0xc4 -- interrupt level=0xe pil=0 %o7=0xc0190a98 -- _mtx_lock_spin() at _mtx_lock_spin+0xf4 tlb_page_demap() at tlb_page_demap+0xa0 pmap_zero_page_idle() at pmap_zero_page_idle+0xdc vm_page_zero_idle() at vm_page_zero_idle+0x108 vm_pagezero() at vm_pagezero+0x4c fork_exit() at fork_exit+0x94 fork_trampoline() at fork_trampoline+0x8 Does this s[c]hed any light on the cause? Kris --PEIAKu/WMn1b1Hv9 Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2 (FreeBSD) iD8DBQFD/h9TWry0BWjoQKURArxhAKCnMua6P8Spb4cTkLDESoiCsq6DPgCg4i/r +Dt/NEYDxNk62AYCel9JINc= =ulhm -----END PGP SIGNATURE----- --PEIAKu/WMn1b1Hv9--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20060223204716.GA90985>