Date: Thu, 7 Dec 2006 11:18:52 +0800 From: David Xu <davidxu@freebsd.org> To: freebsd-stable@freebsd.org Cc: stable@freebsd.org, Gleb Smirnoff <glebius@freebsd.org> Subject: Re: RELENG_6 panic under heavy load Message-ID: <200612071118.52922.davidxu@freebsd.org> In-Reply-To: <20061116111525.GO32700@FreeBSD.org> References: <20061113084430.GE59604@dimma.mow.oilspace.com> <20061116102436.GN32700@FreeBSD.org> <20061116111525.GO32700@FreeBSD.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On Thursday 16 November 2006 19:15, Gleb Smirnoff wrote: > On Thu, Nov 16, 2006 at 01:24:36PM +0300, Gleb Smirnoff wrote: > T> I wonder why UMA was suspected to be the problem. Dima gave > T> me access to the core. Here are more details from the trace: > > It looks like a race between two threads in one process. Look here: > > (kgdb) frame 12 > #12 0xd05f4fc1 in _mtx_lock_sleep (m=0xd5dd5498, tid=3583683968, opts=0, > file=0x12 <Address 0x12 out of bounds>, line=18) at > /usr/src/sys/kern/kern_mutex.c:579 579 > turnstile_wait(&m->mtx_object, mtx_owner(m)); (kgdb) p *m > $10 = {mtx_object = {lo_class = 0xd084e224, lo_name = 0xd080508c "process > lock", lo_type = 0xd080508c "process lock", lo_flags = 4390912, lo_list = { > tqe_next = 0xd5dd56b0, tqe_prev = 0xd5dd5290}, lo_witness = 0xd088a100}, > mtx_lock = 3611674882, mtx_recurse = 0} (kgdb) p ((struct thread *)tid) > $15 = (struct thread *) 0xd59aad80 > (kgdb) p ((struct thread *)(m->mtx_lock & ~(0x1 | 0x2))) > $17 = (struct thread *) 0xd745c900 > (kgdb) p ((struct thread *)(m->mtx_lock & ~(0x1 | 0x2)))->td_proc > $18 = (struct proc *) 0xd5dd5430 > (kgdb) p ((struct thread *)tid)->td_proc > $19 = (struct proc *) 0xd5dd5430 > > So, we see that one thread blocks on the lock that is held by an > other thread of the same process. Here they are: > > * 134 Thread 100198 (PID=47872: nagios) doadump () at pcpu.h:165 > 133 Thread 100147 (PID=47872: nagios) sched_switch (td=0xd745c900, > newtd=0xd51f7a80, flags=2) at /usr/src/sys/kern/sched_4bsd.c:980 > > Let's look at the second one: > > (kgdb) thread 133 > [Switching to thread 133 (Thread 100147)]#0 sched_switch (td=0xd745c900, > newtd=0xd51f7a80, flags=2) at /usr/src/sys/kern/sched_4bsd.c:980 980 > sched_lock.mtx_lock = (uintptr_t)td; > (kgdb) bt > #0 sched_switch (td=0xd745c900, newtd=0xd51f7a80, flags=2) at > /usr/src/sys/kern/sched_4bsd.c:980 #1 0xd0607f46 in mi_switch (flags=2, > newtd=0x0) at /usr/src/sys/kern/kern_synch.c:420 #2 0xd0615ecf in > maybe_preempt_in_ksegrp (td=0xd59aad80) at kern_switch.c:467 #3 0xd06160c8 Can you try the patch ? http://people.freebsd.org/~davidxu/patch/ksegrp_preempt.patch
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200612071118.52922.davidxu>