From owner-freebsd-current@FreeBSD.ORG Fri Aug 7 12:44:44 2009 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id EC7111065689; Fri, 7 Aug 2009 12:44:43 +0000 (UTC) (envelope-from jhb@freebsd.org) Received: from cyrus.watson.org (cyrus.watson.org [65.122.17.42]) by mx1.freebsd.org (Postfix) with ESMTP id B74F68FC2E; Fri, 7 Aug 2009 12:44:43 +0000 (UTC) Received: from bigwig.baldwin.cx (66.111.2.69.static.nyinternet.net [66.111.2.69]) by cyrus.watson.org (Postfix) with ESMTPSA id 671E946B06; Fri, 7 Aug 2009 08:44:43 -0400 (EDT) Received: from jhbbsd.hudson-trading.com (unknown [209.249.190.8]) by bigwig.baldwin.cx (Postfix) with ESMTPA id AA4BF8A0AC; Fri, 7 Aug 2009 08:44:42 -0400 (EDT) From: John Baldwin To: freebsd-current@freebsd.org Date: Fri, 7 Aug 2009 08:35:19 -0400 User-Agent: KMail/1.9.7 References: <20090804225806.GA54680@hub.freebsd.org> In-Reply-To: MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200908070835.20246.jhb@freebsd.org> X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.0.1 (bigwig.baldwin.cx); Fri, 07 Aug 2009 08:44:42 -0400 (EDT) X-Virus-Scanned: clamav-milter 0.95.1 at bigwig.baldwin.cx X-Virus-Status: Clean X-Spam-Status: No, score=-2.5 required=4.2 tests=AWL,BAYES_00,RDNS_NONE autolearn=no version=3.2.5 X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on bigwig.baldwin.cx Cc: "Bjoern A. Zeeb" , kib@freebsd.org, Navdeep Parhar , Navdeep Parhar , Larry Rosenman , Robert Watson , lstewart@freebsd.org Subject: Re: reproducible panic in netisr X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 07 Aug 2009 12:44:44 -0000 On Thursday 06 August 2009 10:11:26 am Robert Watson wrote: > On Thu, 6 Aug 2009, Larry Rosenman wrote: > > > On Thu, 6 Aug 2009, Robert Watson wrote: > > > >> On Tue, 4 Aug 2009, Navdeep Parhar wrote: > >> > >>>>> This occurs on today's HEAD + some unrelated patches. That makes it > >>>>> 8.0BETA2+ code. I haven't tried older builds. > >>>> > >>>> We have finally been able to reproduce this ourselves yesterday and > >>> > >>> Well, it happens every single time on all of my amd64 machines. After I'd > >>> already sent my email I noticed that the netisr mutex has an odd address > >>> (pun intended :-)) > >>> > >>> m=0xffffffff8144d867 > >> > >> Heh, indeed. We just spotted the same result here. In this case it's > >> causing a panic because it leads to a non-atomic read due to mtx_lock > >> spanning a cache line boundary, followed shortly by a panic because it's > >> not a valid thread pointer when it's dereferenced, as we get a fractional > >> pointer. > > [snip] > > > > Do we have an ETA for a testable patch? > > RSN, I'm afraid. We can eliminate the effect by reverting the use of DPCPU in > netisr.c (basically reverting to pre-r195019 of netisr.c). The interesting > question is where the problem originates -- is gcc/ld/etc not laying out the > elf section properly, or are the MD parts not providing an aligned base? > There are also probably issues in the DPCPU handling of modules along similar > lines, but first things first. No, gcc/ld/etc is doing the right thing. However, the DPCPU and VNET code implicitly assumes that the dpcpu/vnet sets start off with a specific alignment and that assumption is false (as it turns out). -- John Baldwin