From owner-freebsd-current@FreeBSD.ORG Mon Aug 10 17:52:28 2009 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id EC2841065673 for ; Mon, 10 Aug 2009 17:52:28 +0000 (UTC) (envelope-from julian@elischer.org) Received: from outW.internet-mail-service.net (outw.internet-mail-service.net [216.240.47.246]) by mx1.freebsd.org (Postfix) with ESMTP id C81028FC33 for ; Mon, 10 Aug 2009 17:52:28 +0000 (UTC) Received: from idiom.com (mx0.idiom.com [216.240.32.160]) by out.internet-mail-service.net (Postfix) with ESMTP id 23D17B15FD; Mon, 10 Aug 2009 10:52:29 -0700 (PDT) X-Client-Authorized: MaGic Cook1e X-Client-Authorized: MaGic Cook1e X-Client-Authorized: MaGic Cook1e X-Client-Authorized: MaGic Cook1e X-Client-Authorized: MaGic Cook1e X-Client-Authorized: MaGic Cook1e X-Client-Authorized: MaGic Cook1e X-Client-Authorized: MaGic Cook1e Received: from julian-mac.elischer.org (home.elischer.org [216.240.48.38]) by idiom.com (Postfix) with ESMTP id DED852D6017; Mon, 10 Aug 2009 10:52:27 -0700 (PDT) Message-ID: <4A805E5B.5000103@elischer.org> Date: Mon, 10 Aug 2009 10:52:27 -0700 From: Julian Elischer User-Agent: Thunderbird 2.0.0.22 (Macintosh/20090605) MIME-Version: 1.0 To: "Bjoern A. Zeeb" References: <20090804225806.GA54680@hub.freebsd.org> <20090805054115.O93661@maildrop.int.zabbadoz.net> <20090805063417.GA10969@doormat.home> <20090810133111.C93661@maildrop.int.zabbadoz.net> In-Reply-To: <20090810133111.C93661@maildrop.int.zabbadoz.net> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: Jeff Roberson , lstewart@FreeBSD.org, freebsd-current@freebsd.org, kib@FreeBSD.org, Navdeep Parhar , Larry Rosenman , Robert Watson Subject: Re: reproducible panic in netisr X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 10 Aug 2009 17:52:29 -0000 Bjoern A. Zeeb wrote: > > Looking further at the problem you may have already noticed that > the section for the 'master copy' starts at 0xffffffff8168f900 > and that the __start_set_pcpu is outside of that section at > 0xffffffff8168f8e9. > Looking at a section dump from `readelf -S kernel` you would > notice that the __start_set_pcpu directly follows the end of the > previous section. > > The reasons for this are twofold: > (1) ld places the __start_ symbol at '.' (the location counter), > which at that point is at the end of the old section as the new > (set_pcpu) is not yet added with __start_set_pcpu = ALIGN(0). > (2) because we manually define the section, so that it is > writeable, ld at the point of writing the __start_ symbol does > not yet have any possible section alignment information. That > is the reason for the ALIGN(0) in (1). > > An expected behaviour would be for ld to put the *ABS* at the > address where the section begins, once known or fixup the address. > This could arguably be a bug in ld we should fix post-8.0. This is getting closer to the root cause, and thus closer to the "root fix". > > One possible workaround would be to force the __start_ symbol > and the section to be equally aligned and thus on the same address > using linker scripts. The drawbacks are that we need to touch > the fragile linker scripts for each of the sections we add and > for all architectures individually. As the enforcement of alignment > would be at a different place to the actual set creation, putting > the alignment in might be easily forgotten. personally I'd see if there is a way to align the section on a page boundary.. > The advantage would be that we can always be sure that __start_ > would be on the same address where the section starts. that sounds like the preferable answer to me. > > Another solution is to put minimum alignment on the objects inside the > section in a way that it is only in a single place in the source code. > The section directive in the respective header file, that will > be included by each implementation file, is the ideal place for this. > While cache line alignment seems to be the widest alignment > restriction currently in use, one drawback, like with above ldscript > solution, is that a symbol could possibly enforce a wider alignment > restriction onto the section making the __start_ symbol and the > section beginning to diverge again. Example: > 0xffffffff8168f700 __start_set_pcpu > 0xffffffff8168f800 set_pcpu > 0xffffffff8168f800 pcpu_entry_sched_switch_stats > .. > if we would put an alignment of 1024 on pcpu_entry_sched_switch_stats. > This is unlikely to happen. > > With the minimum alignment, ld, at the time of placing the > __start_ symbol, already knows about the section alignment > and will place it correctly on the section beginning doing: > __start_set_pcpu = ALIGN(CACHE_LINE_SHIFT) at ".". > > > Summary: The minimum alignment seems to be the least-intrusive > solution and is believed to work for the moment. In addition > documenting that the dpcpu and similar sections will not support > super-cache line alignment. > The long term solution would be to fix ld to DTRT. > > > ------------------------------------------------------------------------ > ! > ! Put minimum alignment on the dpcpu and vnet section so that ld > ! when adding the __start_ symbol knows the expected section alignment > ! and can place the __start_ symbol correctly. > ! > ! These sections will not support symbols with super-cache line alignment > ! requirements. > ! > ! See posting , 2009-08-10, to freebsd-current for full > ! details. > ! > ! Debugging and testing patches by: > ! Kamigishi Rei (spambox haruhiism.net), np, lstewart, jhb, > ! kib, rwatson > ! Tested by: Kamigishi Rei, lstewart > ! Reviewed by: kib > ! Approved by: ! Index: sys/sys/pcpu.h > =================================================================== > --- sys/sys/pcpu.h (revision 196086) > +++ sys/sys/pcpu.h (working copy) > @@ -56,12 +56,14 @@ > extern uintptr_t *__start_set_pcpu; > extern uintptr_t *__stop_set_pcpu; > > +__asm__( > #if defined(__arm__) > -__asm__(".section set_pcpu, \"aw\", %progbits"); > + ".section set_pcpu, \"aw\", %progbits\n" > #else > -__asm__(".section set_pcpu, \"aw\", @progbits"); > + ".section set_pcpu, \"aw\", @progbits\n" > #endif > -__asm__(".previous"); > + "\t.p2align " __XSTRING(CACHE_LINE_SHIFT) "\n" > + "\t.previous"); > > /* > * Array of dynamic pcpu base offsets. Indexed by id. > Index: sys/net/vnet.h > =================================================================== > --- sys/net/vnet.h (revision 196086) > +++ sys/net/vnet.h (working copy) > @@ -185,12 +185,14 @@ > * Virtual network stack memory allocator, which allows global > variables to > * be automatically instantiated for each network stack instance. > */ > +__asm__( > #if defined(__arm__) > -__asm__(".section " VNET_SETNAME ", \"aw\", %progbits"); > + ".section " VNET_SETNAME ", \"aw\", %progbits\n" > #else > -__asm__(".section " VNET_SETNAME ", \"aw\", @progbits"); > + ".section " VNET_SETNAME ", \"aw\", @progbits\n" > #endif I may be visually impaired but I'm not seeing a reason for the ifdef arm.. > -__asm__(".previous"); > + "\t.p2align " __XSTRING(CACHE_LINE_SHIFT) "\n" > + "\t.previous"); > > #define VNET_NAME(n) vnet_entry_##n > #define VNET > > ------------------------------------------------------------------------ >