From owner-freebsd-stable@FreeBSD.ORG Thu Nov 16 11:15:28 2006 Return-Path: X-Original-To: stable@FreeBSD.org Delivered-To: freebsd-stable@FreeBSD.ORG Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 7F92A16A415 for ; Thu, 16 Nov 2006 11:15:28 +0000 (UTC) (envelope-from glebius@FreeBSD.org) Received: from cell.sick.ru (cell.sick.ru [217.72.144.68]) by mx1.FreeBSD.org (Postfix) with ESMTP id ED80843D55 for ; Thu, 16 Nov 2006 11:15:27 +0000 (GMT) (envelope-from glebius@FreeBSD.org) Received: from cell.sick.ru (glebius@localhost [127.0.0.1]) by cell.sick.ru (8.13.4/8.13.3) with ESMTP id kAGBFPk4093984 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO) for ; Thu, 16 Nov 2006 14:15:26 +0300 (MSK) (envelope-from glebius@FreeBSD.org) Received: (from glebius@localhost) by cell.sick.ru (8.13.4/8.13.1/Submit) id kAGBFPZc093983 for stable@freebsd.org; Thu, 16 Nov 2006 14:15:25 +0300 (MSK) (envelope-from glebius@FreeBSD.org) X-Authentication-Warning: cell.sick.ru: glebius set sender to glebius@FreeBSD.org using -f Date: Thu, 16 Nov 2006 14:15:25 +0300 From: Gleb Smirnoff To: stable@FreeBSD.org Message-ID: <20061116111525.GO32700@FreeBSD.org> References: <20061113084430.GE59604@dimma.mow.oilspace.com> <20061116102436.GN32700@FreeBSD.org> Mime-Version: 1.0 Content-Type: text/plain; charset=koi8-r Content-Disposition: inline In-Reply-To: <20061116102436.GN32700@FreeBSD.org> User-Agent: Mutt/1.5.6i Cc: Subject: Re: RELENG_6 panic under heavy load X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 16 Nov 2006 11:15:28 -0000 On Thu, Nov 16, 2006 at 01:24:36PM +0300, Gleb Smirnoff wrote: T> I wonder why UMA was suspected to be the problem. Dima gave T> me access to the core. Here are more details from the trace: It looks like a race between two threads in one process. Look here: (kgdb) frame 12 #12 0xd05f4fc1 in _mtx_lock_sleep (m=0xd5dd5498, tid=3583683968, opts=0, file=0x12
, line=18) at /usr/src/sys/kern/kern_mutex.c:579 579 turnstile_wait(&m->mtx_object, mtx_owner(m)); (kgdb) p *m $10 = {mtx_object = {lo_class = 0xd084e224, lo_name = 0xd080508c "process lock", lo_type = 0xd080508c "process lock", lo_flags = 4390912, lo_list = { tqe_next = 0xd5dd56b0, tqe_prev = 0xd5dd5290}, lo_witness = 0xd088a100}, mtx_lock = 3611674882, mtx_recurse = 0} (kgdb) p ((struct thread *)tid) $15 = (struct thread *) 0xd59aad80 (kgdb) p ((struct thread *)(m->mtx_lock & ~(0x1 | 0x2))) $17 = (struct thread *) 0xd745c900 (kgdb) p ((struct thread *)(m->mtx_lock & ~(0x1 | 0x2)))->td_proc $18 = (struct proc *) 0xd5dd5430 (kgdb) p ((struct thread *)tid)->td_proc $19 = (struct proc *) 0xd5dd5430 So, we see that one thread blocks on the lock that is held by an other thread of the same process. Here they are: * 134 Thread 100198 (PID=47872: nagios) doadump () at pcpu.h:165 133 Thread 100147 (PID=47872: nagios) sched_switch (td=0xd745c900, newtd=0xd51f7a80, flags=2) at /usr/src/sys/kern/sched_4bsd.c:980 Let's look at the second one: (kgdb) thread 133 [Switching to thread 133 (Thread 100147)]#0 sched_switch (td=0xd745c900, newtd=0xd51f7a80, flags=2) at /usr/src/sys/kern/sched_4bsd.c:980 980 sched_lock.mtx_lock = (uintptr_t)td; (kgdb) bt #0 sched_switch (td=0xd745c900, newtd=0xd51f7a80, flags=2) at /usr/src/sys/kern/sched_4bsd.c:980 #1 0xd0607f46 in mi_switch (flags=2, newtd=0x0) at /usr/src/sys/kern/kern_synch.c:420 #2 0xd0615ecf in maybe_preempt_in_ksegrp (td=0xd59aad80) at kern_switch.c:467 #3 0xd06160c8 in setrunqueue (td=0xd59aad80, flags=0) at kern_switch.c:585 #4 0xd06151e7 in sched_wakeup (td=0xd59aad80) at /usr/src/sys/kern/sched_4bsd.c:996 #5 0xd0608025 in setrunnable (td=0xd59aad80) at /usr/src/sys/kern/kern_synch.c:483 #6 0xd060d78e in thread_unsuspend_one (td=0xd59aad80) at /usr/src/sys/kern/kern_thread.c:972 #7 0xd060d584 in thread_suspend_check (return_instead=0) at /usr/src/sys/kern/kern_thread.c:935 #8 0xd0628a88 in userret (td=0xd745c900, frame=0xf5dd4d38, oticks=1) at /usr/src/sys/kern/subr_trap.c:116 #9 0xd07a6e16 in syscall (frame= {tf_fs = 134938683, tf_es = 59, tf_ds = -809566149, tf_edi = 134997504, tf_esi = 134998528, tf_ebp = -813707944, tf_isp = -170046108, tf_ebx = 672261300, tf_edx = 0, tf_ecx = 134969072, tf_eax = 1, tf_trapno = 0, tf_err = 2, tf_eip = 672832335, tf_cs = 51, tf_eflags = 646, tf_esp = -813707972, tf_ss = 59}) at /usr/src/sys/i386/i386/trap.c:1034 #10 0xd078f38f in Xint0x80_syscall () at /usr/src/sys/i386/i386/exception.s:200 -- Totus tuus, Glebius. GLEBIUS-RIPN GLEB-RIPE