From owner-freebsd-current@FreeBSD.ORG  Tue Jul 21 14:27:12 2009
Return-Path: <owner-freebsd-current@FreeBSD.ORG>
Delivered-To: freebsd-current@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id DA485106566B;
	Tue, 21 Jul 2009 14:27:12 +0000 (UTC) (envelope-from jhb@freebsd.org)
Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42])
	by mx1.freebsd.org (Postfix) with ESMTP id AAB3C8FC13;
	Tue, 21 Jul 2009 14:27:12 +0000 (UTC) (envelope-from jhb@freebsd.org)
Received: from bigwig.baldwin.cx (66.111.2.69.static.nyinternet.net
	[66.111.2.69])
	by cyrus.watson.org (Postfix) with ESMTPSA id 5D61246B23;
	Tue, 21 Jul 2009 10:27:12 -0400 (EDT)
Received: from jhbbsd.hudson-trading.com (unknown [209.249.190.8])
	by bigwig.baldwin.cx (Postfix) with ESMTPA id D42C38A09C;
	Tue, 21 Jul 2009 10:27:11 -0400 (EDT)
From: John Baldwin <jhb@freebsd.org>
To: Kamigishi Rei <spambox@haruhiism.net>
Date: Tue, 21 Jul 2009 10:27:06 -0400
User-Agent: KMail/1.9.7
References: <4A659F98.2060007@haruhiism.net>
	<200907210857.01690.jhb@freebsd.org>
	<4A65C9D1.6080902@haruhiism.net>
In-Reply-To: <4A65C9D1.6080902@haruhiism.net>
MIME-Version: 1.0
Content-Type: text/plain;
  charset="iso-8859-1"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
Message-Id: <200907211027.06589.jhb@freebsd.org>
X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.0.1
	(bigwig.baldwin.cx); Tue, 21 Jul 2009 10:27:11 -0400 (EDT)
X-Virus-Scanned: clamav-milter 0.95.1 at bigwig.baldwin.cx
X-Virus-Status: Clean
X-Spam-Status: No, score=-2.5 required=4.2 tests=AWL,BAYES_00,RDNS_NONE
	autolearn=no version=3.2.5
X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on bigwig.baldwin.cx
Cc: Lawrence Stewart <lstewart@freebsd.org>, freebsd-current@freebsd.org
Subject: Re: [follow-up] Fatal trap 12 in r195146+ in netisr_queue_internal
X-BeenThere: freebsd-current@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Discussions about the use of FreeBSD-current
	<freebsd-current.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>, 
	<mailto:freebsd-current-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-current>
List-Post: <mailto:freebsd-current@freebsd.org>
List-Help: <mailto:freebsd-current-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-current>,
	<mailto:freebsd-current-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 21 Jul 2009 14:27:13 -0000

On Tuesday 21 July 2009 9:59:45 am Kamigishi Rei wrote:
> John Baldwin wrote:
> > On Tuesday 21 July 2009 6:59:36 am Kamigishi Rei wrote:
> >   
> >> Everything goes fine until - under heavy load on an interface, usually - 
> >> we reach a point where:
> >> 1. m->mtx_lock is 4 (== MTX_UNOWNED).
> >> 2. v is assigned mtx_lock's value (4 == MTX_UNOWNED).
> >> 3. condition (v == MTX_UNOWNED) fails.
> >>     
> > This will not happen.  If you look at the disassembly you will see this 
can't 
> > happen either.  Do you have a crashdump from a crash?
> >   
> I've got about 40 crash dumps on unmodded (without debug code) kernel, 
> and 3 or 4 with debug stuff (KASSERTs added by me).
> I can reproduce this on my test server (Core2 Duo 3.0, 4GB RAM), on my 
> home PC (Core2 Quad 2.5), and in VMWare with 2 CPUs in VT-x mode on my 
> laptop.
> It can't be reproduced on single-CPU single-core (including 
> hyperthreaded) systems.
> 
> Quoting,
> 
> (kgdb) fr 6
> #6  0xffffffff80586255 in _mtx_lock_sleep (m=0xffffffff80e60823, 
> tid=18446742977255365296, opts=Variable "opts" is not available.
> ) at /usr/src/sys/kern/kern_mutex.c:407
> 407                     owner = (struct thread *)(v & ~MTX_FLAGMASK);
> 
> (kgdb) print m->mtx_lock
> $14 = 4
> (kgdb) print v
> $15 = 21946368

% printf "%x\n" 21946368
14ee000

Can you print out 'owner' as well?  You won't get a panic until you actually 
dereference 'owner' to get 'owner->td_state' even though gdb will show this 
as the faulting line (gdb can sometimes get confused by compiler 
optimization).  You are seeing these values because mtx_lock was changed (due 
to either a mtx_unlock() or a mtx_init()) while you were spinning.   That 
value of v is not what I have typically seen in these panics.  Do you also 
have the original fatal kernel trap messages?

-- 
John Baldwin