Date: Thu, 01 Nov 2012 15:44:53 +0100 From: Andre Oppermann <andre@freebsd.org> To: Jim Harris <jim.harris@gmail.com> Cc: freebsd-arch@freebsd.org Subject: Re: CACHE_LINE_SIZE on x86 Message-ID: <50928AE5.4010107@freebsd.org> In-Reply-To: <CAJP=Hc8mVycfjWN7_V4VAAHf%2B0AiFozqcF4Shz26uh5oGiDxKQ@mail.gmail.com> References: <CAJP=Hc_F%2B-RdD=XZ7ikBKVKE_XW88Y35Xw0bYE6gGURLPDOAWw@mail.gmail.com> <201210250918.00602.jhb@freebsd.org> <5089690A.8070503@networx.ch> <201210251732.31631.jhb@freebsd.org> <CAJP=Hc_98G=77gSO9hQ_knTedhNuXDErUt34=5vSPmux=tQR1g@mail.gmail.com> <CAJP=Hc8mVycfjWN7_V4VAAHf%2B0AiFozqcF4Shz26uh5oGiDxKQ@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On 01.11.2012 01:50, Jim Harris wrote: > > > On Thu, Oct 25, 2012 at 2:40 PM, Jim Harris <jim.harris@gmail.com <mailto:jim.harris@gmail.com>> wrote: > > On Thu, Oct 25, 2012 at 2:32 PM, John Baldwin <jhb@freebsd.org <mailto:jhb@freebsd.org>> wrote: > > > > It would be good to know though if there are performance benefits from > > avoiding sharing across paired lines in this manner. Even if it has > > its own MOESI state, there might still be negative effects from sharing > > the pair. > > On 2S, I do see further benefits by using 128 byte padding instead of > 64. On 1S, I see no difference. I've been meaning to turn off > prefetching on my system to see if it has any effect in the 2S case - > I can give that a shot tomorrow. > > > So tomorrow turned into next week, but I have some data finally. > > I've updated to HEAD from today, including all of the mtx_padalign changes. I tested 64 v. 128 byte > alignment on 2S amd64 (SNB Xeon). My BIOS also has a knob to disable the adjacent line prefetching > (MLC spatial prefetcher), so I ran both 64b and 128b against this specific prefetcher both enabled > and disabled. > > MLC prefetcher enabled: 3-6% performance improvement, 1-5% decrease in CPU utilization by using 128b > padding instead of 64b. Just to be sure. The numbers you show are just for the one location you've converted to the new padded mutex and a particular test case? -- Andre > MLC prefetcher disabled: performance and CPU utilization differences are in the noise - anywhere > from -0.2% to +0.5%. The performanc here matches extremely closely (within 1%) with 128b padding > and the MLC prefetcher enabled. > > I think it's safe to say that the 128b pad/alignment is worth keeping for multi-socket x86, and is > most certainly due to the MLC spatial prefetcher. > > I still see no measurable differences with 64b v. 128b padding on 1S, but that's only testing with > my benchmark. > > Thanks, > > -Jim >
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?50928AE5.4010107>