Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 7 Aug 2009 08:35:19 -0400
From:      John Baldwin <jhb@freebsd.org>
To:        freebsd-current@freebsd.org
Cc:        "Bjoern A. Zeeb" <bz@freebsd.org>, kib@freebsd.org, Navdeep Parhar <np@freebsd.org>, Navdeep Parhar <nparhar@gmail.com>, Larry Rosenman <ler@lerctr.org>, Robert Watson <rwatson@freebsd.org>, lstewart@freebsd.org
Subject:   Re: reproducible panic in netisr
Message-ID:  <200908070835.20246.jhb@freebsd.org>
In-Reply-To: <alpine.BSF.2.00.0908061508520.62916@fledge.watson.org>
References:  <20090804225806.GA54680@hub.freebsd.org> <alpine.BSF.2.00.0908060834120.21318@thebighonker.lerctr.org> <alpine.BSF.2.00.0908061508520.62916@fledge.watson.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On Thursday 06 August 2009 10:11:26 am Robert Watson wrote:
> On Thu, 6 Aug 2009, Larry Rosenman wrote:
> 
> > On Thu, 6 Aug 2009, Robert Watson wrote:
> >
> >> On Tue, 4 Aug 2009, Navdeep Parhar wrote:
> >> 
> >>>>> This occurs on today's HEAD + some unrelated patches.  That makes it 
> >>>>> 8.0BETA2+ code.  I haven't tried older builds.
> >>>> 
> >>>> We have finally been able to reproduce this ourselves yesterday and
> >>> 
> >>> Well, it happens every single time on all of my amd64 machines. After I'd 
> >>> already sent my email I noticed that the netisr mutex has an odd address 
> >>> (pun intended :-))
> >>> 
> >>> m=0xffffffff8144d867
> >> 
> >> Heh, indeed.  We just spotted the same result here.  In this case it's 
> >> causing a panic because it leads to a non-atomic read due to mtx_lock 
> >> spanning a cache line boundary, followed shortly by a panic because it's 
> >> not a valid thread pointer when it's dereferenced, as we get a fractional 
> >> pointer.
> > [snip]
> >
> > Do we have an ETA for a testable patch?
> 
> RSN, I'm afraid.  We can eliminate the effect by reverting the use of DPCPU in 
> netisr.c (basically reverting to pre-r195019 of netisr.c).  The interesting 
> question is where the problem originates -- is gcc/ld/etc not laying out the 
> elf section properly, or are the MD parts not providing an aligned base? 
> There are also probably issues in the DPCPU handling of modules along similar 
> lines, but first things first.

No, gcc/ld/etc is doing the right thing.  However, the DPCPU and VNET code
implicitly assumes that the dpcpu/vnet sets start off with a specific alignment
and that assumption is false (as it turns out).

-- 
John Baldwin



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200908070835.20246.jhb>