From owner-freebsd-hackers@FreeBSD.ORG Tue Sep 20 20:40:09 2005 Return-Path: X-Original-To: freebsd-hackers@freebsd.org Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 63A9416A420 for ; Tue, 20 Sep 2005 20:40:09 +0000 (GMT) (envelope-from jhb@FreeBSD.org) Received: from mv.twc.weather.com (mv.twc.weather.com [65.212.71.225]) by mx1.FreeBSD.org (Postfix) with ESMTP id 19FAC43D4C for ; Tue, 20 Sep 2005 20:40:07 +0000 (GMT) (envelope-from jhb@FreeBSD.org) Received: from [10.50.41.233] (Not Verified[10.50.41.233]) by mv.twc.weather.com with NetIQ MailMarshal (v6, 0, 3, 8) id ; Tue, 20 Sep 2005 16:55:24 -0400 From: John Baldwin To: Koen Martens Date: Tue, 20 Sep 2005 16:04:43 -0400 User-Agent: KMail/1.8 References: <2B3B2AA816369A4E87D7BE63EC9D2F269B7B4D@SDCEXCHANGE01.ad.amcc.com> <432F1310.80007@metro.cx> In-Reply-To: <432F1310.80007@metro.cx> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200509201604.44393.jhb@FreeBSD.org> Cc: freebsd-hackers@freebsd.org, Dimitry Andric , Vinod Kashyap Subject: Re: panic in propagate_priority w/ postgresql under heavy load X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 20 Sep 2005 20:40:09 -0000 On Monday 19 September 2005 03:35 pm, Koen Martens wrote: > Vinod Kashyap wrote: > > You seem to be booting off of a 9000 (twa) controller and not 7000/8000 > > (twe). > > It could be because of a 9000 firmware bug that you are not being able > > to > > get the dump. The firmware wrongly interprets physical address 0x0 as > > invalid > > during dumps, and fails the operations. This bug will be fixed in > > future > > firmware releases. > > Ok, it's been a while, here is an update on this. > > I ran a heavily instrumented kernel for two weeks on the server, it > did not crash in that time. I then took out the witness and kdb/ddb > stuff, because the decreased performance was a bit of a nuisance, > however i retained the ability to obtain a crash dump. I had to > limit physical memory, put it on 1.8GB in loader.conf:hw.physmem > because swap and physmem are both 2GB. Tested with 'reboot -d' gave > me a core dump. > > Without the debug stuff in the kernel, it crashed within 2 days, > same story: postgresql process, function propagate_priority. > However, no dump was written to disk :( > > Furthermore, i've been seeing the same crash (in propagate_priority) > on another box in mysql processes. Both servers seem to panic every > 2-3 days. I have another server of the exact same hardware > configuration, but it is mainly idling most of the time. Haven't > seen that one crash yet. > > I am thinking now that it is a bug in the twa driver, so i'll have > to dig in to that. Furthermore, it seems to have to do with some > sort of concurrency issue or otherwise timing-sensitive issue, > because slowing the kernel down with debug code seems to avoid the > panic. But, as i am completely new to the freebsd kernel and don't > even know what turnstiles are, i imagine i will have a hard time. So > if anyone can offer some help, please :) > > Ok, thanks for your attention, This panic usually happens either because a thread went to sleep while holding a mutex (WITNESS will warn you about this when it happens, but as you noted, it slows things down). It can also happen perhaps if a thread exits while holding a lock or if a thread is blocked on a mutex that is destroyed after it blocks on it. -- John Baldwin <>< http://www.FreeBSD.org/~jhb/ "Power Users Use the Power to Serve" = http://www.FreeBSD.org