Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 1 Nov 2012 22:47:12 -0400
From:      Eitan Adler <lists@eitanadler.com>
To:        Jim Harris <jim.harris@gmail.com>
Cc:        Attilio Rao <attilio@freebsd.org>, Andre Oppermann <andre@freebsd.org>, freebsd-arch@freebsd.org
Subject:   Re: CACHE_LINE_SIZE on x86
Message-ID:  <CAF6rxgk4oUZLyBtsTkwr36NPR9zBmmRKe59QaAfvW13KEs2CNg@mail.gmail.com>
In-Reply-To: <CAJP=Hc_mEcO6wStbcuRp_3McUGhEp06nvKXh8QO%2BQ0x67KrM7w@mail.gmail.com>
References:  <CAJP=Hc_F%2B-RdD=XZ7ikBKVKE_XW88Y35Xw0bYE6gGURLPDOAWw@mail.gmail.com> <201210250918.00602.jhb@freebsd.org> <5089690A.8070503@networx.ch> <201210251732.31631.jhb@freebsd.org> <CAJP=Hc_98G=77gSO9hQ_knTedhNuXDErUt34=5vSPmux=tQR1g@mail.gmail.com> <CAJP=Hc8mVycfjWN7_V4VAAHf%2B0AiFozqcF4Shz26uh5oGiDxKQ@mail.gmail.com> <50928AE5.4010107@freebsd.org> <CAJP=Hc_mEcO6wStbcuRp_3McUGhEp06nvKXh8QO%2BQ0x67KrM7w@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On 1 November 2012 14:36, Jim Harris <jim.harris@gmail.com> wrote:
> On Thu, Nov 1, 2012 at 7:44 AM, Andre Oppermann <andre@freebsd.org> wrote:
>
>> On 01.11.2012 01:50, Jim Harris wrote:
>>
>>>
>>>
>>> On Thu, Oct 25, 2012 at 2:40 PM, Jim Harris <jim.harris@gmail.com<mailto:
>>> jim.harris@gmail.com>> wrote:
>>>
>>>
>>>     On Thu, Oct 25, 2012 at 2:32 PM, John Baldwin <jhb@freebsd.org<mailto:
>>> jhb@freebsd.org>> wrote:
>>>      >
>>>      > It would be good to know though if there are performance benefits
>>> from
>>>      > avoiding sharing across paired lines in this manner.  Even if it
>>> has
>>>      > its own MOESI state, there might still be negative effects from
>>> sharing
>>>      > the pair.
>>>
>>>     On 2S, I do see further benefits by using 128 byte padding instead of
>>>     64.  On 1S, I see no difference.  I've been meaning to turn off
>>>     prefetching on my system to see if it has any effect in the 2S case -
>>>     I can give that a shot tomorrow.
>>>
>>>
>>> So tomorrow turned into next week, but I have some data finally.
>>>
>>> I've updated to HEAD from today, including all of the mtx_padalign
>>> changes.  I tested 64 v. 128 byte
>>> alignment on 2S amd64 (SNB Xeon).  My BIOS also has a knob to disable the
>>> adjacent line prefetching
>>> (MLC spatial prefetcher), so I ran both 64b and 128b against this
>>> specific prefetcher both enabled
>>> and disabled.
>>>
>>> MLC prefetcher enabled: 3-6% performance improvement, 1-5% decrease in
>>> CPU utilization by using 128b
>>> padding instead of 64b.
>>>
>>
>> Just to be sure.  The numbers you show are just for the one location you've
>> converted to the new padded mutex and a particular test case?
>>
>
> There are two locations actually - the struct tdq lock in the ULE
> scheduler, and the callout_cpu lock in kern_timeout.c.
>
> And yes, I've been only running a custom benchmark I developed here to help
> to try to uncover some of these areas of spinlock contention.  It was
> originally used for NVMe driver performance testing, but has been helpful
> in uncovering some other issues outside of the NVMe driver itself (such as
> these contended spinlocks).  It spawns a large number of kernel threads,
> each of which submits an I/O and then sleeps until it is woken by the
> interrupt thread when the I/O completes.  It stresses the scheduler and
> also callout since I start and stop a timer for each I/O.
>
> I think the only thing proves is that there is benefit to having x86
> CACHE_LINE_SIZE still set to 128.

Does this benchmark simulate reality or does padding the locks only
help on this specific benchmark?



-- 
Eitan Adler



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAF6rxgk4oUZLyBtsTkwr36NPR9zBmmRKe59QaAfvW13KEs2CNg>