Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 25 Oct 2012 18:23:38 +0200
From:      Andre Oppermann <andre@freebsd.org>
To:        Bruce Evans <brde@optusnet.com.au>
Cc:        Adrian Chadd <adrian@FreeBSD.org>, src-committers@FreeBSD.org, svn-src-all@FreeBSD.org, Attilio Rao <attilio@FreeBSD.org>, svn-src-head@FreeBSD.org, Jim Harris <jim.harris@gmail.com>
Subject:   Re: svn commit: r242014 - head/sys/kern
Message-ID:  <5089678A.6070609@freebsd.org>
In-Reply-To: <20121025142313.S999@besplex.bde.org>
References:  <201210241836.q9OIafqo073002@svn.freebsd.org> <CAJ-VmonpdJ445hXVaoHqFgS0v7QRwqHWodQrVHm2CN9T661www@mail.gmail.com> <CAJP=Hc9XmvfW3MrDjvK15OAx1fyfjPk%2BEhqHUOzoEpChu5imtg@mail.gmail.com> <50883EA8.1010308@freebsd.org> <CAJ-FndAJwuD=oUHyVK4xHoZNrf0r%2Bq2WxZ9rUMW%2B-zzMo_8QuA@mail.gmail.com> <20121025142313.S999@besplex.bde.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On 25.10.2012 05:49, Bruce Evans wrote:
> On Wed, 24 Oct 2012, Attilio Rao wrote:
>
>> On Wed, Oct 24, 2012 at 8:16 PM, Andre Oppermann <andre@freebsd.org> wrote:
>>> ...
>>> Let's go back and see how we can do this the sanest way.  These are
>>> the options I see at the moment:
>>>
>>>  1. sprinkle __aligned(CACHE_LINE_SIZE) all over the place
>>
>> This is wrong because it doesn't give padding.
>
> Unless it is sprinkled in struct declarations.
>
>>>  2. use a macro like MTX_ALIGN that can be SMP/UP aware and in
>>>     the future possibly change to a different compiler dependent
>>>     align attribute
>>
>> What is this macro supposed to do? I don't understand that from your
>> description.
>>
>>>  3. embed __aligned(CACHE_LINE_SIZE) into struct mtx itself so it
>>>     automatically gets aligned in all cases, even when dynamically
>>>     allocated.
>>
>> This works but I think it is overkill for structures including sleep
>> mutexes which are the vast majority. So I wouldn't certainly be in
>> favor of such a patch.
>
> This doesn't work either with fully dynamic (auto) allocations.  Stack
> alignment is generally broken (limited, and pessimized for both space
> and time) in gcc (it works better in clang).  On amd64, it is limited
> by the default of -mpreferred-stack-boundary=4.  Since 2**4 is smaller
> than the cache line size and stack alignments larger than it are broken
> in gcc, __aligned(CACHE_LINE_SIZE) never works (except accidentally,
> 16/CACHE_LINE_SIZE of the time.  On i386, we reduce the space/time
> pessimizations a little by overriding the default to
> -mpreferred-stack-boundary=2.  2**2 is even smaller than the cache
> line size.  (The pessimizations are for both space and time, since
> time and code space is wasted for the code to keep the stack aligned,
> and cache space and thus also time are wasted for padding.  Most
> functions don't benefit from more than sizeof(register_t) alignment.)

I'm not aware of stack allocated mutexes anywhere in the kernel.
Even if there is a case it's very special and unique.

I've verified that __aligned(CACHE_LINE_SIZE) on the definition of
struct mtx itself (in sys/_mutex.h) correctly aligns and pads the
global .bss resident mutexes for 64B and 128B cache line sizes.

> Dynamic allocations via malloc() get whatever alignment malloc() gives.
> This is only required to be 4 or 8 or 16 or so (the maximum for a C
> object declared in conforming C (no __align()), but malloc() usually
> gives more.  If it gives CACHE_LINE_SIZE, that is wasteful for most
> small allocations.

Stand-alone mutexes are normally not malloc'ed.  They're always
embedded into some larger structure they protect.

> __builtin_alloca() is broken in gcc-3.3.3, but works in gcc-4.2.1, at
> least on i386.  In gcc-3.3.3, it assumes that the stack is the default
> 16-byte aligned even if -mpreferred-stack-boundary=2 is in CFLAGS to
> say otherwise, and just subtracts from the stack pointer.  In gcc-4.2.1,
> it does the necessary andl of the stack pointer, but only 16-byte
> alignment.
>
> It is another bug that there sre no extensions of malloc() or alloca().
> Since malloc() is in the library and may give CACHE_LINE_SIZE but
> __builtin_alloca() is in the compiler and only gives 16, these functions
> are not even as compatible as they should be.
>
> I don't know of any mutexes allocated on the stack, but there are stack
> frames with mcontexts in them that need special alignment so they cause
> problems on i386.  They can't just be put on the stack due to the above
> bugs. They are laboriously allocated using malloc().  Since they are a
> quite large, 1 mcontext barely fits on the kernel stack, so kib didn't
> like my alloca() method for allocating them.

You lost me here.

-- 
Andre




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?5089678A.6070609>