From owner-svn-src-head@FreeBSD.ORG Thu Oct 25 16:23:52 2012 Return-Path: Delivered-To: svn-src-head@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 87CEBF54 for ; Thu, 25 Oct 2012 16:23:52 +0000 (UTC) (envelope-from andre@freebsd.org) Received: from c00l3r.networx.ch (c00l3r.networx.ch [62.48.2.2]) by mx1.freebsd.org (Postfix) with ESMTP id CE5E98FC1A for ; Thu, 25 Oct 2012 16:23:51 +0000 (UTC) Received: (qmail 42229 invoked from network); 25 Oct 2012 18:01:28 -0000 Received: from c00l3r.networx.ch (HELO [127.0.0.1]) ([62.48.2.2]) (envelope-sender ) by c00l3r.networx.ch (qmail-ldap-1.03) with SMTP for ; 25 Oct 2012 18:01:28 -0000 Message-ID: <5089678A.6070609@freebsd.org> Date: Thu, 25 Oct 2012 18:23:38 +0200 From: Andre Oppermann User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:16.0) Gecko/20121010 Thunderbird/16.0.1 MIME-Version: 1.0 To: Bruce Evans Subject: Re: svn commit: r242014 - head/sys/kern References: <201210241836.q9OIafqo073002@svn.freebsd.org> <50883EA8.1010308@freebsd.org> <20121025142313.S999@besplex.bde.org> In-Reply-To: <20121025142313.S999@besplex.bde.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: Adrian Chadd , src-committers@FreeBSD.org, svn-src-all@FreeBSD.org, Attilio Rao , svn-src-head@FreeBSD.org, Jim Harris X-BeenThere: svn-src-head@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: SVN commit messages for the src tree for head/-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 25 Oct 2012 16:23:52 -0000 On 25.10.2012 05:49, Bruce Evans wrote: > On Wed, 24 Oct 2012, Attilio Rao wrote: > >> On Wed, Oct 24, 2012 at 8:16 PM, Andre Oppermann wrote: >>> ... >>> Let's go back and see how we can do this the sanest way. These are >>> the options I see at the moment: >>> >>> 1. sprinkle __aligned(CACHE_LINE_SIZE) all over the place >> >> This is wrong because it doesn't give padding. > > Unless it is sprinkled in struct declarations. > >>> 2. use a macro like MTX_ALIGN that can be SMP/UP aware and in >>> the future possibly change to a different compiler dependent >>> align attribute >> >> What is this macro supposed to do? I don't understand that from your >> description. >> >>> 3. embed __aligned(CACHE_LINE_SIZE) into struct mtx itself so it >>> automatically gets aligned in all cases, even when dynamically >>> allocated. >> >> This works but I think it is overkill for structures including sleep >> mutexes which are the vast majority. So I wouldn't certainly be in >> favor of such a patch. > > This doesn't work either with fully dynamic (auto) allocations. Stack > alignment is generally broken (limited, and pessimized for both space > and time) in gcc (it works better in clang). On amd64, it is limited > by the default of -mpreferred-stack-boundary=4. Since 2**4 is smaller > than the cache line size and stack alignments larger than it are broken > in gcc, __aligned(CACHE_LINE_SIZE) never works (except accidentally, > 16/CACHE_LINE_SIZE of the time. On i386, we reduce the space/time > pessimizations a little by overriding the default to > -mpreferred-stack-boundary=2. 2**2 is even smaller than the cache > line size. (The pessimizations are for both space and time, since > time and code space is wasted for the code to keep the stack aligned, > and cache space and thus also time are wasted for padding. Most > functions don't benefit from more than sizeof(register_t) alignment.) I'm not aware of stack allocated mutexes anywhere in the kernel. Even if there is a case it's very special and unique. I've verified that __aligned(CACHE_LINE_SIZE) on the definition of struct mtx itself (in sys/_mutex.h) correctly aligns and pads the global .bss resident mutexes for 64B and 128B cache line sizes. > Dynamic allocations via malloc() get whatever alignment malloc() gives. > This is only required to be 4 or 8 or 16 or so (the maximum for a C > object declared in conforming C (no __align()), but malloc() usually > gives more. If it gives CACHE_LINE_SIZE, that is wasteful for most > small allocations. Stand-alone mutexes are normally not malloc'ed. They're always embedded into some larger structure they protect. > __builtin_alloca() is broken in gcc-3.3.3, but works in gcc-4.2.1, at > least on i386. In gcc-3.3.3, it assumes that the stack is the default > 16-byte aligned even if -mpreferred-stack-boundary=2 is in CFLAGS to > say otherwise, and just subtracts from the stack pointer. In gcc-4.2.1, > it does the necessary andl of the stack pointer, but only 16-byte > alignment. > > It is another bug that there sre no extensions of malloc() or alloca(). > Since malloc() is in the library and may give CACHE_LINE_SIZE but > __builtin_alloca() is in the compiler and only gives 16, these functions > are not even as compatible as they should be. > > I don't know of any mutexes allocated on the stack, but there are stack > frames with mcontexts in them that need special alignment so they cause > problems on i386. They can't just be put on the stack due to the above > bugs. They are laboriously allocated using malloc(). Since they are a > quite large, 1 mcontext barely fits on the kernel stack, so kib didn't > like my alloca() method for allocating them. You lost me here. -- Andre