Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 20 Sep 2005 16:04:43 -0400
From:      John Baldwin <jhb@FreeBSD.org>
To:        Koen Martens <fbsd@metro.cx>
Cc:        freebsd-hackers@freebsd.org, Dimitry Andric <dimitry@andric.com>, Vinod Kashyap <vkashyap@amcc.com>
Subject:   Re: panic in propagate_priority w/ postgresql under heavy load
Message-ID:  <200509201604.44393.jhb@FreeBSD.org>
In-Reply-To: <432F1310.80007@metro.cx>
References:  <2B3B2AA816369A4E87D7BE63EC9D2F269B7B4D@SDCEXCHANGE01.ad.amcc.com> <432F1310.80007@metro.cx>

next in thread | previous in thread | raw e-mail | index | archive | help
On Monday 19 September 2005 03:35 pm, Koen Martens wrote:
> Vinod Kashyap wrote:
> > You seem to be booting off of a 9000 (twa) controller and not 7000/8000
> > (twe).
> > It could be because of a 9000 firmware bug that you are not being able
> > to
> > get the dump.  The firmware wrongly interprets physical address 0x0 as
> > invalid
> > during dumps, and fails the operations.  This bug will be fixed in
> > future
> > firmware releases.
>
> Ok, it's been a while, here is an update on this.
>
> I ran a heavily instrumented kernel for two weeks on the server, it
> did not crash in that time. I then took out the witness and kdb/ddb
> stuff, because the decreased performance was a bit of a nuisance,
> however i retained the ability to obtain a crash dump. I had to
> limit physical memory, put it on 1.8GB in loader.conf:hw.physmem
> because swap and physmem are both 2GB. Tested with 'reboot -d' gave
> me a core dump.
>
> Without the debug stuff in the kernel, it crashed within 2 days,
> same story: postgresql process, function propagate_priority.
> However, no dump was written to disk :(
>
> Furthermore, i've been seeing the same crash (in propagate_priority)
> on another box in mysql processes. Both servers seem to panic every
> 2-3 days. I have another server of the exact same hardware
> configuration, but it is mainly idling most of the time. Haven't
> seen that one crash yet.
>
> I am thinking now that it is a bug in the twa driver, so i'll have
> to dig in to that. Furthermore, it seems to have to do with some
> sort of concurrency issue or otherwise timing-sensitive issue,
> because slowing the kernel down with debug code seems to avoid the
> panic. But, as i am completely new to the freebsd kernel and don't
> even know what turnstiles are, i imagine i will have a hard time. So
> if anyone can offer some help, please :)
>
> Ok, thanks for your attention,

This panic usually happens either because a thread went to sleep while holding 
a mutex (WITNESS will warn you about this when it happens, but as you noted, 
it slows things down).  It can also happen perhaps if a thread exits while 
holding a lock or if a thread is blocked on a mutex that is destroyed after 
it blocks on it.

-- 
John Baldwin <jhb@FreeBSD.org>  <><  http://www.FreeBSD.org/~jhb/
"Power Users Use the Power to Serve"  =  http://www.FreeBSD.org



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200509201604.44393.jhb>