Date: Fri, 7 Aug 2009 08:35:19 -0400 From: John Baldwin <jhb@freebsd.org> To: freebsd-current@freebsd.org Cc: "Bjoern A. Zeeb" <bz@freebsd.org>, kib@freebsd.org, Navdeep Parhar <np@freebsd.org>, Navdeep Parhar <nparhar@gmail.com>, Larry Rosenman <ler@lerctr.org>, Robert Watson <rwatson@freebsd.org>, lstewart@freebsd.org Subject: Re: reproducible panic in netisr Message-ID: <200908070835.20246.jhb@freebsd.org> In-Reply-To: <alpine.BSF.2.00.0908061508520.62916@fledge.watson.org> References: <20090804225806.GA54680@hub.freebsd.org> <alpine.BSF.2.00.0908060834120.21318@thebighonker.lerctr.org> <alpine.BSF.2.00.0908061508520.62916@fledge.watson.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On Thursday 06 August 2009 10:11:26 am Robert Watson wrote: > On Thu, 6 Aug 2009, Larry Rosenman wrote: > > > On Thu, 6 Aug 2009, Robert Watson wrote: > > > >> On Tue, 4 Aug 2009, Navdeep Parhar wrote: > >> > >>>>> This occurs on today's HEAD + some unrelated patches. That makes it > >>>>> 8.0BETA2+ code. I haven't tried older builds. > >>>> > >>>> We have finally been able to reproduce this ourselves yesterday and > >>> > >>> Well, it happens every single time on all of my amd64 machines. After I'd > >>> already sent my email I noticed that the netisr mutex has an odd address > >>> (pun intended :-)) > >>> > >>> m=0xffffffff8144d867 > >> > >> Heh, indeed. We just spotted the same result here. In this case it's > >> causing a panic because it leads to a non-atomic read due to mtx_lock > >> spanning a cache line boundary, followed shortly by a panic because it's > >> not a valid thread pointer when it's dereferenced, as we get a fractional > >> pointer. > > [snip] > > > > Do we have an ETA for a testable patch? > > RSN, I'm afraid. We can eliminate the effect by reverting the use of DPCPU in > netisr.c (basically reverting to pre-r195019 of netisr.c). The interesting > question is where the problem originates -- is gcc/ld/etc not laying out the > elf section properly, or are the MD parts not providing an aligned base? > There are also probably issues in the DPCPU handling of modules along similar > lines, but first things first. No, gcc/ld/etc is doing the right thing. However, the DPCPU and VNET code implicitly assumes that the dpcpu/vnet sets start off with a specific alignment and that assumption is false (as it turns out). -- John Baldwin
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200908070835.20246.jhb>