Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 31 Oct 2012 17:50:52 -0700
From:      Jim Harris <jim.harris@gmail.com>
To:        John Baldwin <jhb@freebsd.org>
Cc:        freebsd-arch@freebsd.org
Subject:   Re: CACHE_LINE_SIZE on x86
Message-ID:  <CAJP=Hc8mVycfjWN7_V4VAAHf%2B0AiFozqcF4Shz26uh5oGiDxKQ@mail.gmail.com>
In-Reply-To: <CAJP=Hc_98G=77gSO9hQ_knTedhNuXDErUt34=5vSPmux=tQR1g@mail.gmail.com>
References:  <CAJP=Hc_F%2B-RdD=XZ7ikBKVKE_XW88Y35Xw0bYE6gGURLPDOAWw@mail.gmail.com> <201210250918.00602.jhb@freebsd.org> <5089690A.8070503@networx.ch> <201210251732.31631.jhb@freebsd.org> <CAJP=Hc_98G=77gSO9hQ_knTedhNuXDErUt34=5vSPmux=tQR1g@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Thu, Oct 25, 2012 at 2:40 PM, Jim Harris <jim.harris@gmail.com> wrote:

> On Thu, Oct 25, 2012 at 2:32 PM, John Baldwin <jhb@freebsd.org> wrote:
> >
> > It would be good to know though if there are performance benefits from
> > avoiding sharing across paired lines in this manner.  Even if it has
> > its own MOESI state, there might still be negative effects from sharing
> > the pair.
>
> On 2S, I do see further benefits by using 128 byte padding instead of
> 64.  On 1S, I see no difference.  I've been meaning to turn off
> prefetching on my system to see if it has any effect in the 2S case -
> I can give that a shot tomorrow.
>
>
So tomorrow turned into next week, but I have some data finally.

I've updated to HEAD from today, including all of the mtx_padalign
changes.  I tested 64 v. 128 byte alignment on 2S amd64 (SNB Xeon).  My
BIOS also has a knob to disable the adjacent line prefetching (MLC spatial
prefetcher), so I ran both 64b and 128b against this specific prefetcher
both enabled and disabled.

MLC prefetcher enabled: 3-6% performance improvement, 1-5% decrease in CPU
utilization by using 128b padding instead of 64b.

MLC prefetcher disabled: performance and CPU utilization differences are in
the noise - anywhere from -0.2% to +0.5%.  The performanc here matches
extremely closely (within 1%) with 128b padding and the MLC prefetcher
enabled.

I think it's safe to say that the 128b pad/alignment is worth keeping for
multi-socket x86, and is most certainly due to the MLC spatial prefetcher.

I still see no measurable differences with 64b v. 128b padding on 1S, but
that's only testing with my benchmark.

Thanks,

-Jim



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAJP=Hc8mVycfjWN7_V4VAAHf%2B0AiFozqcF4Shz26uh5oGiDxKQ>