Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 24 Feb 2006 07:42:46 -0500
From:      John Baldwin <jhb@freebsd.org>
To:        freebsd-sparc64@freebsd.org
Cc:        sparc64@freebsd.org, Kris Kennaway <kris@obsecurity.org>
Subject:   Re: "sched_lock held too long" panic + trace
Message-ID:  <200602240742.48505.jhb@freebsd.org>
In-Reply-To: <20060223204716.GA90985@xor.obsecurity.org>
References:  <20060223204716.GA90985@xor.obsecurity.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On Thursday 23 February 2006 03:47 pm, Kris Kennaway wrote:
> One of my e4500s has started panicking regularly under load because
> sched_lock was held for > 5 seconds.  Since on sparc64 it always
> deadlocks after this panic instead of entering DDB, I wasn't able to
> track down the cause.  Instead, I changed the panic to first
> DELAY(1000000*PCPU_GET(cpuid)) (so that different CPUs don't overlap
> the printfs) and then kdb_backtrace().
>
> Doing so I obtained the following trace (still a bit corrupted, but
> hopefully more useful).
>
> spspilolock hchedolockehdlb by 0xfffff2b2be951500ofor 5 s cecdn
>
> spin ponk oohkd hhdd lcc  eel0 yy fxfffbf921be01f50 f rr>ec  es
> nDs
>
>    stack backtrace:
> statclock() at statclock+0x6c
> tick_hardclock() at tick_hardclock+0x100
> -- interrupt level=3D0xe pil=3D0 %o7=3D0xc017fb08 --
> sched_runnable() at sched_runnable+spi8
> fcrkscxid()oat ferk ex 0+0f94f802bk_bram0olone>)  t forkstrampoline+0x8
> panic: spin lock held too long
> cpuid =3D 0
> KDB: enter: panic
>
> KDB: stack backtrace:
> cpu+0x6c kgkmc uo)ca
> tick_hardclock() at tick_hardclock+0xc4
> -- interrupt level=3D0xe pil=3D0 %o7=3D0xc0190a98 --
> _mtx_lock_spin() at _mtx_lock_spin+0xf4
> idle_proc() at idle_proc+0x16c
> fork_exit() at fork_exit+0x94
> fork_trampoline() at fork_trampoline+0x8
>
> KDB: stack backtrace:
> hardclock_cpu() at hardclock_cpu+0x6c
> tick_hardclock() at tick_hardclock+0xc4
> -- interrupt level=3D0xe pil=3D0 %o7=3D0xc0190a98 --
> _mtx_lock_spin() at _mtx_lock_spin+0xf4
> idle_proc() at idle_proc+0x16c
> fork_exit() at fork_exit+0x94
> fork_trampoline() at fork_trampoline+0x8
>
> KDB: stack backtrace:
> hardclock_cpu() at hardclock_cpu+0x6c
> tick_hardclock() at tick_hardclock+0xc4
> -- interrupt level=3D0xe pil=3D0 %o7=3D0xc0190a98 --
> _mtx_lock_spin() at _mtx_lock_spin+0xf4
> idle_proc() at idle_proc+0x16c
> fork_exit() at fork_exit+0x94
>
> KDB: stack backtrace:
> hardclock_cpu() at hardclock_cpu+0x6c
> tick_hardclock() at tick_hardclock+0xc4
> -- interrupt level=3D0xe pil=3D0 %o7=3D0xc01b5c84 --
> runq_check() at runq_check+0x24
> idle_proc() at idle_proc+0x108
> fork_exit() at fork_exit+0x94
> fork_trampoline() at fork_trampoline+0x8
>
> KDB: stack backtrace:
> hardclock_cpu() at hardclock_cpu+0x6c
> tick_hardclock() at tick_hardclock+0xc4
> -- interrupt level=3D0xe pil=3D0 %o7=3D0xc01b5c84 --
> runq_check() at runq_check+0x2c
> idle_proc() at idle_proc+0x108
> fork_exit() at fork_exit+0x94
> fork_trampoline() at fork_trampoline+0x8
>
> KDB: stack backtrace:
> hardclock_cpu() at hardclock_cpu+0x6c
> tick_hardclock() at tick_hardclock+0xc4
> -- interrupt level=3D0xe pil=3D0 %o7=3D0xc0190a98 --
> _mtx_lock_spin() at _mtx_lock_spin+0xf4
> tlb_page_demap() at tlb_page_demap+0xa0
> pmap_zero_page_idle() at pmap_zero_page_idle+0xdc
> vm_page_zero_idle() at vm_page_zero_idle+0x108
> vm_pagezero() at vm_pagezero+0x4c
> fork_exit() at fork_exit+0x94
> fork_trampoline() at fork_trampoline+0x8
>
> Does this s[c]hed any light on the cause?

It's the idle loop bug that I need to fix I believe.  I'll try to write up =
a=20
patch later today perhaps.

=2D-=20
John Baldwin <jhb@FreeBSD.org> =A0<>< =A0http://www.FreeBSD.org/~jhb/
"Power Users Use the Power to Serve" =A0=3D =A0http://www.FreeBSD.org



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200602240742.48505.jhb>