From owner-freebsd-stable@FreeBSD.ORG Fri Dec 8 09:56:51 2006 Return-Path: X-Original-To: freebsd-stable@freebsd.org Delivered-To: freebsd-stable@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 0CEAE16A403; Fri, 8 Dec 2006 09:56:51 +0000 (UTC) (envelope-from dkirhlarov@oilspace.com) Received: from office.oilspace.com (ns2.oilspace.com [194.129.65.230]) by mx1.FreeBSD.org (Postfix) with ESMTP id C779A43CAF; Fri, 8 Dec 2006 09:55:52 +0000 (GMT) (envelope-from dkirhlarov@oilspace.com) Received: from dkirhlarov.mow.oilspace.com (mos.oilspace.com [81.222.156.189]) by office.oilspace.com (Postfix) with ESMTP id ECA30136E5C; Fri, 8 Dec 2006 09:56:47 +0000 (GMT) Received: from dkirhlarov.mow.oilspace.com (localhost [127.0.0.1]) by dkirhlarov.mow.oilspace.com (8.13.8/8.13.8) with ESMTP id kB89ul78001615; Fri, 8 Dec 2006 12:56:47 +0300 (MSK) (envelope-from dkirhlarov@dkirhlarov.mow.oilspace.com) Received: (from dkirhlarov@localhost) by dkirhlarov.mow.oilspace.com (8.13.8/8.13.8/Submit) id kB89ulT9001614; Fri, 8 Dec 2006 12:56:47 +0300 (MSK) (envelope-from dkirhlarov) Date: Fri, 8 Dec 2006 12:56:47 +0300 From: Dmitriy Kirhlarov To: John Baldwin Message-ID: <20061208095646.GA1131@dkirhlarov.mow.oilspace.com> Mail-Followup-To: John Baldwin , freebsd-stable@freebsd.org, Gleb Smirnoff References: <20061113084430.GE59604@dimma.mow.oilspace.com> <20061116111525.GO32700@FreeBSD.org> <20061116160900.GQ32700@FreeBSD.org> <200612061209.40253.jhb@freebsd.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <200612061209.40253.jhb@freebsd.org> X-Mailer: Mutt-ng devel (2005-03-13) based on Mutt 1.5.9 X-Operating-System: FreeBSD 6.2-PRERELEASE User-Agent: mutt-ng/devel-r804 (FreeBSD) Cc: Gleb Smirnoff , freebsd-stable@freebsd.org Subject: Re: RELENG_6 panic under heavy load X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 08 Dec 2006 09:56:51 -0000 On Wed, Dec 06, 2006 at 12:09:39PM -0500, John Baldwin wrote: > > ...) and here is something difficult to understand, when $poll tries to > > make $fork runnable, while $fork is trying to put itself in the turnstile > > that is owned by $poll > > Hmm. I'm guessing the problem is the $poll thread is suspended (not exited) > while holding the proc lock? That would appear to be the problem. That > thread can't run again to release the lock. Ah, yes, I see the bug. > Something like this should fix it: > > Index: kern_thread.c > =================================================================== > RCS file: /usr/cvs/src/sys/kern/kern_thread.c,v > retrieving revision 1.216.2.6 > diff -u -r1.216.2.6 kern_thread.c > --- kern_thread.c 2 Sep 2006 17:29:57 -0000 1.216.2.6 > +++ kern_thread.c 6 Dec 2006 17:06:26 -0000 > @@ -969,7 +969,9 @@ > TAILQ_REMOVE(&p->p_suspended, td, td_runq); > TD_CLR_SUSPENDED(td); > p->p_suspcount--; > + critical_enter(); > setrunnable(td); > + critical_exit(); > } > > /* > > What this does is force setrunnable() to be in a nested critical section so we > won't preempt during setrunnable() until either the caller of > thread_unsuspend_one() eventually releases sched_lock, or, in the case you > ran into, the thread does a PROC_UNLOCK() and calls mi_switch(). lbsd02# uptime 9:46AM up 22:45, 2 users, load averages: 7.50, 6.59, 6.32 It's work. Thank, you. Without your patch max uptime was 9 hours. I'm planning to test David's patch on weekend. WBR Dmitriy