From owner-freebsd-stable@FreeBSD.ORG Thu Nov 16 16:09:06 2006 Return-Path: X-Original-To: stable@FreeBSD.org Delivered-To: freebsd-stable@FreeBSD.ORG Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id DF17016A417 for ; Thu, 16 Nov 2006 16:09:06 +0000 (UTC) (envelope-from glebius@FreeBSD.org) Received: from cell.sick.ru (cell.sick.ru [217.72.144.68]) by mx1.FreeBSD.org (Postfix) with ESMTP id 639C743D6E for ; Thu, 16 Nov 2006 16:09:03 +0000 (GMT) (envelope-from glebius@FreeBSD.org) Received: from cell.sick.ru (glebius@localhost [127.0.0.1]) by cell.sick.ru (8.13.4/8.13.3) with ESMTP id kAGG90Nb095997 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO) for ; Thu, 16 Nov 2006 19:09:01 +0300 (MSK) (envelope-from glebius@FreeBSD.org) Received: (from glebius@localhost) by cell.sick.ru (8.13.4/8.13.1/Submit) id kAGG90ab095996 for stable@freebsd.org; Thu, 16 Nov 2006 19:09:00 +0300 (MSK) (envelope-from glebius@FreeBSD.org) X-Authentication-Warning: cell.sick.ru: glebius set sender to glebius@FreeBSD.org using -f Date: Thu, 16 Nov 2006 19:09:00 +0300 From: Gleb Smirnoff To: stable@FreeBSD.org Message-ID: <20061116160900.GQ32700@FreeBSD.org> References: <20061113084430.GE59604@dimma.mow.oilspace.com> <20061116102436.GN32700@FreeBSD.org> <20061116111525.GO32700@FreeBSD.org> Mime-Version: 1.0 Content-Type: text/plain; charset=koi8-r Content-Disposition: inline In-Reply-To: <20061116111525.GO32700@FreeBSD.org> User-Agent: Mutt/1.5.6i Cc: Subject: Re: RELENG_6 panic under heavy load X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 16 Nov 2006 16:09:07 -0000 On Thu, Nov 16, 2006 at 02:15:25PM +0300, Gleb Smirnoff wrote: T> On Thu, Nov 16, 2006 at 01:24:36PM +0300, Gleb Smirnoff wrote: T> T> I wonder why UMA was suspected to be the problem. Dima gave T> T> me access to the core. Here are more details from the trace: And even more: (kgdb) thread 133 [Switching to thread 133 (Thread 100147)]#0 sched_switch (td=0xd745c900, newtd=0xd51f7a80, flags=2) at /usr/src/sys/kern/sched_4bsd.c:980 980 sched_lock.mtx_lock = (uintptr_t)td; (kgdb) frame 9 #9 0xd07a6e16 in syscall (frame= {tf_fs = 134938683, tf_es = 59, tf_ds = -809566149, tf_edi = 134997504, tf_esi = 134998528, tf_ebp = -813707944, tf_isp = -170046108, tf_ebx = 672261300, tf_edx = 0, tf_ecx = 134969072, tf_eax = 1, tf_trapno = 0, tf_err = 2, tf_eip = 672832335, tf_cs = 51, tf_eflags = 646, tf_esp = -813707972, tf_ss = 59}) at /usr/src/sys/i386/i386/trap.c:1034 1034 userret(td, &frame, sticks); (kgdb) p *callp $92 = {sy_narg = 65539, sy_call = 0xd0630550 , sy_auevent = 43012} (kgdb) set $poll = (struct thread *)0xd745c900 (kgdb) set $fork = (struct thread *)0xd59aad80 (kgdb) p $poll->td_state $93 = TDS_INHIBITED (kgdb) p $poll->td_inhibitors $94 = 1 == TDI_SUSPENDED (kgdb) p/x $poll->td_flags $96 = 0x1010c01 == TDF_BORROWING | TDF_BOUNDARY | TDF_ASTPENDING | TDF_NEEDRESCHED | TDF_SCHED0 (kgdb) p $fork->td_state $97 = TDS_INHIBITED (kgdb) p $fork->td_inhibitors $98 = 8 == TDI_LOCK (kgdb) p/x $fork->td_flags $99 = 0x1000000 == TDF_SCHED0 Not everything clear yet, but looks like: 1) $fork thread obtains proc lock 2) $poll thread blocks on proc lock 3) $fork thread has suspended the $poll thread in thread_single() 4) $fork thread temporarily unlocks proc lock (line 821) and is preempted by $poll thread 5) $poll thread obtains proc lock, and starts doing its poll job 6) $fork thread blocks on proc lock, and is added to its turnstile 7) $poll thread drops the proc lock, but isn't preempted by $fork 8) $poll thread exits and is preempted by $fork ...) and here is something difficult to understand, when $poll tries to make $fork runnable, while $fork is trying to put itself in the turnstile that is owned by $poll -- Totus tuus, Glebius. GLEBIUS-RIPN GLEB-RIPE