From owner-freebsd-stable@FreeBSD.ORG Wed Dec 6 17:10:01 2006 Return-Path: X-Original-To: freebsd-stable@freebsd.org Delivered-To: freebsd-stable@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id C44BA16A4FC; Wed, 6 Dec 2006 17:10:01 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from server.baldwin.cx (66-23-211-162.clients.speedfactory.net [66.23.211.162]) by mx1.FreeBSD.org (Postfix) with ESMTP id 5DA4F43CC4; Wed, 6 Dec 2006 17:08:51 +0000 (GMT) (envelope-from jhb@freebsd.org) Received: from localhost.corp.yahoo.com (john@localhost [127.0.0.1]) (authenticated bits=0) by server.baldwin.cx (8.13.6/8.13.6) with ESMTP id kB6H9ZCh061538; Wed, 6 Dec 2006 12:09:35 -0500 (EST) (envelope-from jhb@freebsd.org) From: John Baldwin To: freebsd-stable@freebsd.org Date: Wed, 6 Dec 2006 12:09:39 -0500 User-Agent: KMail/1.9.1 References: <20061113084430.GE59604@dimma.mow.oilspace.com> <20061116111525.GO32700@FreeBSD.org> <20061116160900.GQ32700@FreeBSD.org> In-Reply-To: <20061116160900.GQ32700@FreeBSD.org> MIME-Version: 1.0 Content-Type: text/plain; charset="koi8-r" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200612061209.40253.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH authentication, not delayed by milter-greylist-2.0.2 (server.baldwin.cx [127.0.0.1]); Wed, 06 Dec 2006 12:09:35 -0500 (EST) X-Virus-Scanned: ClamAV 0.88.3/2294/Wed Dec 6 11:28:02 2006 on server.baldwin.cx X-Virus-Status: Clean X-Spam-Status: No, score=-4.4 required=4.2 tests=ALL_TRUSTED,AWL,BAYES_00 autolearn=ham version=3.1.3 X-Spam-Checker-Version: SpamAssassin 3.1.3 (2006-06-01) on server.baldwin.cx Cc: Gleb Smirnoff Subject: Re: RELENG_6 panic under heavy load X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 06 Dec 2006 17:10:02 -0000 On Thursday 16 November 2006 11:09, Gleb Smirnoff wrote: > On Thu, Nov 16, 2006 at 02:15:25PM +0300, Gleb Smirnoff wrote: > T> On Thu, Nov 16, 2006 at 01:24:36PM +0300, Gleb Smirnoff wrote: > T> T> I wonder why UMA was suspected to be the problem. Dima gave > T> T> me access to the core. Here are more details from the trace: > > And even more: > > (kgdb) thread 133 > [Switching to thread 133 (Thread 100147)]#0 sched_switch (td=0xd745c900, newtd=0xd51f7a80, flags=2) at /usr/src/sys/kern/sched_4bsd.c:980 > 980 sched_lock.mtx_lock = (uintptr_t)td; > (kgdb) frame 9 > #9 0xd07a6e16 in syscall (frame= > {tf_fs = 134938683, tf_es = 59, tf_ds = -809566149, tf_edi = 134997504, tf_esi = 134998528, tf_ebp = -813707944, tf_isp = -170046108, tf_ebx = 672261300, tf_edx = 0, tf_ecx = 134969072, tf_eax = 1, tf_trapno = 0, tf_err = 2, tf_eip = 672832335, tf_cs = 51, tf_eflags = 646, tf_esp = -813707972, tf_ss = 59}) > at /usr/src/sys/i386/i386/trap.c:1034 > 1034 userret(td, &frame, sticks); > (kgdb) p *callp > $92 = {sy_narg = 65539, sy_call = 0xd0630550 , sy_auevent = 43012} > > (kgdb) set $poll = (struct thread *)0xd745c900 > (kgdb) set $fork = (struct thread *)0xd59aad80 > > (kgdb) p $poll->td_state > $93 = TDS_INHIBITED > (kgdb) p $poll->td_inhibitors > $94 = 1 == TDI_SUSPENDED > (kgdb) p/x $poll->td_flags > $96 = 0x1010c01 == TDF_BORROWING | TDF_BOUNDARY | TDF_ASTPENDING | TDF_NEEDRESCHED | TDF_SCHED0 > (kgdb) p $fork->td_state > $97 = TDS_INHIBITED > (kgdb) p $fork->td_inhibitors > $98 = 8 == TDI_LOCK > (kgdb) p/x $fork->td_flags > $99 = 0x1000000 == TDF_SCHED0 > > Not everything clear yet, but looks like: > > 1) $fork thread obtains proc lock > 2) $poll thread blocks on proc lock > 3) $fork thread has suspended the $poll thread in thread_single() > 4) $fork thread temporarily unlocks proc lock (line 821) and is > preempted by $poll thread > 5) $poll thread obtains proc lock, and starts doing its poll job > 6) $fork thread blocks on proc lock, and is added to its turnstile > 7) $poll thread drops the proc lock, but isn't preempted by $fork > 8) $poll thread exits and is preempted by $fork > > ...) and here is something difficult to understand, when $poll tries to > make $fork runnable, while $fork is trying to put itself in the turnstile > that is owned by $poll Hmm. I'm guessing the problem is the $poll thread is suspended (not exited) while holding the proc lock? That would appear to be the problem. That thread can't run again to release the lock. Ah, yes, I see the bug. Something like this should fix it: Index: kern_thread.c =================================================================== RCS file: /usr/cvs/src/sys/kern/kern_thread.c,v retrieving revision 1.216.2.6 diff -u -r1.216.2.6 kern_thread.c --- kern_thread.c 2 Sep 2006 17:29:57 -0000 1.216.2.6 +++ kern_thread.c 6 Dec 2006 17:06:26 -0000 @@ -969,7 +969,9 @@ TAILQ_REMOVE(&p->p_suspended, td, td_runq); TD_CLR_SUSPENDED(td); p->p_suspcount--; + critical_enter(); setrunnable(td); + critical_exit(); } /* What this does is force setrunnable() to be in a nested critical section so we won't preempt during setrunnable() until either the caller of thread_unsuspend_one() eventually releases sched_lock, or, in the case you ran into, the thread does a PROC_UNLOCK() and calls mi_switch(). -- John Baldwin