Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 01 Nov 2012 15:44:53 +0100
From:      Andre Oppermann <andre@freebsd.org>
To:        Jim Harris <jim.harris@gmail.com>
Cc:        freebsd-arch@freebsd.org
Subject:   Re: CACHE_LINE_SIZE on x86
Message-ID:  <50928AE5.4010107@freebsd.org>
In-Reply-To: <CAJP=Hc8mVycfjWN7_V4VAAHf%2B0AiFozqcF4Shz26uh5oGiDxKQ@mail.gmail.com>
References:  <CAJP=Hc_F%2B-RdD=XZ7ikBKVKE_XW88Y35Xw0bYE6gGURLPDOAWw@mail.gmail.com> <201210250918.00602.jhb@freebsd.org> <5089690A.8070503@networx.ch> <201210251732.31631.jhb@freebsd.org> <CAJP=Hc_98G=77gSO9hQ_knTedhNuXDErUt34=5vSPmux=tQR1g@mail.gmail.com> <CAJP=Hc8mVycfjWN7_V4VAAHf%2B0AiFozqcF4Shz26uh5oGiDxKQ@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On 01.11.2012 01:50, Jim Harris wrote:
>
>
> On Thu, Oct 25, 2012 at 2:40 PM, Jim Harris <jim.harris@gmail.com <mailto:jim.harris@gmail.com>> wrote:
>
>     On Thu, Oct 25, 2012 at 2:32 PM, John Baldwin <jhb@freebsd.org <mailto:jhb@freebsd.org>> wrote:
>      >
>      > It would be good to know though if there are performance benefits from
>      > avoiding sharing across paired lines in this manner.  Even if it has
>      > its own MOESI state, there might still be negative effects from sharing
>      > the pair.
>
>     On 2S, I do see further benefits by using 128 byte padding instead of
>     64.  On 1S, I see no difference.  I've been meaning to turn off
>     prefetching on my system to see if it has any effect in the 2S case -
>     I can give that a shot tomorrow.
>
>
> So tomorrow turned into next week, but I have some data finally.
>
> I've updated to HEAD from today, including all of the mtx_padalign changes.  I tested 64 v. 128 byte
> alignment on 2S amd64 (SNB Xeon).  My BIOS also has a knob to disable the adjacent line prefetching
> (MLC spatial prefetcher), so I ran both 64b and 128b against this specific prefetcher both enabled
> and disabled.
>
> MLC prefetcher enabled: 3-6% performance improvement, 1-5% decrease in CPU utilization by using 128b
> padding instead of 64b.

Just to be sure.  The numbers you show are just for the one location you've
converted to the new padded mutex and a particular test case?

-- 
Andre

> MLC prefetcher disabled: performance and CPU utilization differences are in the noise - anywhere
> from -0.2% to +0.5%.  The performanc here matches extremely closely (within 1%) with 128b padding
> and the MLC prefetcher enabled.
>
> I think it's safe to say that the 128b pad/alignment is worth keeping for multi-socket x86, and is
> most certainly due to the MLC spatial prefetcher.
>
> I still see no measurable differences with 64b v. 128b padding on 1S, but that's only testing with
> my benchmark.
>
> Thanks,
>
> -Jim
>




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?50928AE5.4010107>