Date: Fri, 24 Feb 2006 07:42:46 -0500 From: John Baldwin <jhb@freebsd.org> To: freebsd-sparc64@freebsd.org Cc: sparc64@freebsd.org, Kris Kennaway <kris@obsecurity.org> Subject: Re: "sched_lock held too long" panic + trace Message-ID: <200602240742.48505.jhb@freebsd.org> In-Reply-To: <20060223204716.GA90985@xor.obsecurity.org> References: <20060223204716.GA90985@xor.obsecurity.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On Thursday 23 February 2006 03:47 pm, Kris Kennaway wrote: > One of my e4500s has started panicking regularly under load because > sched_lock was held for > 5 seconds. Since on sparc64 it always > deadlocks after this panic instead of entering DDB, I wasn't able to > track down the cause. Instead, I changed the panic to first > DELAY(1000000*PCPU_GET(cpuid)) (so that different CPUs don't overlap > the printfs) and then kdb_backtrace(). > > Doing so I obtained the following trace (still a bit corrupted, but > hopefully more useful). > > spspilolock hchedolockehdlb by 0xfffff2b2be951500ofor 5 s cecdn > > spin ponk oohkd hhdd lcc eel0 yy fxfffbf921be01f50 f rr>ec es > nDs > > stack backtrace: > statclock() at statclock+0x6c > tick_hardclock() at tick_hardclock+0x100 > -- interrupt level=3D0xe pil=3D0 %o7=3D0xc017fb08 -- > sched_runnable() at sched_runnable+spi8 > fcrkscxid()oat ferk ex 0+0f94f802bk_bram0olone>) t forkstrampoline+0x8 > panic: spin lock held too long > cpuid =3D 0 > KDB: enter: panic > > KDB: stack backtrace: > cpu+0x6c kgkmc uo)ca > tick_hardclock() at tick_hardclock+0xc4 > -- interrupt level=3D0xe pil=3D0 %o7=3D0xc0190a98 -- > _mtx_lock_spin() at _mtx_lock_spin+0xf4 > idle_proc() at idle_proc+0x16c > fork_exit() at fork_exit+0x94 > fork_trampoline() at fork_trampoline+0x8 > > KDB: stack backtrace: > hardclock_cpu() at hardclock_cpu+0x6c > tick_hardclock() at tick_hardclock+0xc4 > -- interrupt level=3D0xe pil=3D0 %o7=3D0xc0190a98 -- > _mtx_lock_spin() at _mtx_lock_spin+0xf4 > idle_proc() at idle_proc+0x16c > fork_exit() at fork_exit+0x94 > fork_trampoline() at fork_trampoline+0x8 > > KDB: stack backtrace: > hardclock_cpu() at hardclock_cpu+0x6c > tick_hardclock() at tick_hardclock+0xc4 > -- interrupt level=3D0xe pil=3D0 %o7=3D0xc0190a98 -- > _mtx_lock_spin() at _mtx_lock_spin+0xf4 > idle_proc() at idle_proc+0x16c > fork_exit() at fork_exit+0x94 > > KDB: stack backtrace: > hardclock_cpu() at hardclock_cpu+0x6c > tick_hardclock() at tick_hardclock+0xc4 > -- interrupt level=3D0xe pil=3D0 %o7=3D0xc01b5c84 -- > runq_check() at runq_check+0x24 > idle_proc() at idle_proc+0x108 > fork_exit() at fork_exit+0x94 > fork_trampoline() at fork_trampoline+0x8 > > KDB: stack backtrace: > hardclock_cpu() at hardclock_cpu+0x6c > tick_hardclock() at tick_hardclock+0xc4 > -- interrupt level=3D0xe pil=3D0 %o7=3D0xc01b5c84 -- > runq_check() at runq_check+0x2c > idle_proc() at idle_proc+0x108 > fork_exit() at fork_exit+0x94 > fork_trampoline() at fork_trampoline+0x8 > > KDB: stack backtrace: > hardclock_cpu() at hardclock_cpu+0x6c > tick_hardclock() at tick_hardclock+0xc4 > -- interrupt level=3D0xe pil=3D0 %o7=3D0xc0190a98 -- > _mtx_lock_spin() at _mtx_lock_spin+0xf4 > tlb_page_demap() at tlb_page_demap+0xa0 > pmap_zero_page_idle() at pmap_zero_page_idle+0xdc > vm_page_zero_idle() at vm_page_zero_idle+0x108 > vm_pagezero() at vm_pagezero+0x4c > fork_exit() at fork_exit+0x94 > fork_trampoline() at fork_trampoline+0x8 > > Does this s[c]hed any light on the cause? It's the idle loop bug that I need to fix I believe. I'll try to write up = a=20 patch later today perhaps. =2D-=20 John Baldwin <jhb@FreeBSD.org> =A0<>< =A0http://www.FreeBSD.org/~jhb/ "Power Users Use the Power to Serve" =A0=3D =A0http://www.FreeBSD.org
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200602240742.48505.jhb>