From owner-freebsd-stable@FreeBSD.ORG Tue Dec 12 12:49:25 2006 Return-Path: X-Original-To: stable@freebsd.org Delivered-To: freebsd-stable@FreeBSD.ORG Received: from localhost.my.domain (localhost [127.0.0.1]) by hub.freebsd.org (Postfix) with ESMTP id 71A2216A403; Tue, 12 Dec 2006 12:49:24 +0000 (UTC) (envelope-from davidxu@freebsd.org) From: David Xu To: freebsd-stable@freebsd.org Date: Tue, 12 Dec 2006 20:49:21 +0800 User-Agent: KMail/1.8.2 References: <20061113084430.GE59604@dimma.mow.oilspace.com> <200612071118.52922.davidxu@freebsd.org> <20061212122221.GE39171@dkirhlarov.mow.oilspace.com> In-Reply-To: <20061212122221.GE39171@dkirhlarov.mow.oilspace.com> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200612122049.21525.davidxu@freebsd.org> Cc: stable@freebsd.org, Dmitriy Kirhlarov Subject: Re: RELENG_6 panic under heavy load X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 12 Dec 2006 12:49:25 -0000 On Tuesday 12 December 2006 20:22, Dmitriy Kirhlarov wrote: > On Thu, Dec 07, 2006 at 11:18:52AM +0800, David Xu wrote: > > On Thursday 16 November 2006 19:15, Gleb Smirnoff wrote: > > > On Thu, Nov 16, 2006 at 01:24:36PM +0300, Gleb Smirnoff wrote: > > > T> I wonder why UMA was suspected to be the problem. Dima gave > > > T> me access to the core. Here are more details from the trace: > > > > > > It looks like a race between two threads in one process. Look here: > > > > Can you try the patch ? > > http://people.freebsd.org/~davidxu/patch/ksegrp_preempt.patch > > I've tested it. This patch works also, but with a little bit different > behaviour. With patch from jhb@ I got LA 7-8, with this patch I have > LA 5-6, same as on unpatched system. But it seems to me, that system > is less interactive, compared to jhb@ patch. > > WBR > Dmitriy jhb patch is incomplete, it implies that every place a thread is doing state transition and waking another thread up should be patched, there is other code in kern_sig.c unpatched, though I don't know other places, but the code maybe_preempt_in_ksegrp should be synced with maybe_preempt, it should fix all problems. the LA you have seen is lower than jhb might be a nature of KSEGRP, but I am not sure, if you program forces all threads to be system-scope, it might fix the problem. David Xu