From owner-freebsd-arch@FreeBSD.ORG  Thu Nov  1 14:45:00 2012
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
 by hub.freebsd.org (Postfix) with ESMTP id E91A0AA8
 for <freebsd-arch@freebsd.org>; Thu,  1 Nov 2012 14:45:00 +0000 (UTC)
 (envelope-from andre@freebsd.org)
Received: from c00l3r.networx.ch (c00l3r.networx.ch [62.48.2.2])
 by mx1.freebsd.org (Postfix) with ESMTP id 2DDCC8FC19
 for <freebsd-arch@freebsd.org>; Thu,  1 Nov 2012 14:45:00 +0000 (UTC)
Received: (qmail 79469 invoked from network); 1 Nov 2012 16:21:21 -0000
Received: from c00l3r.networx.ch (HELO [127.0.0.1]) ([62.48.2.2])
 (envelope-sender <andre@freebsd.org>)
 by c00l3r.networx.ch (qmail-ldap-1.03) with SMTP
 for <jim.harris@gmail.com>; 1 Nov 2012 16:21:21 -0000
Message-ID: <50928AE5.4010107@freebsd.org>
Date: Thu, 01 Nov 2012 15:44:53 +0100
From: Andre Oppermann <andre@freebsd.org>
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64;
 rv:16.0) Gecko/20121010 Thunderbird/16.0.1
MIME-Version: 1.0
To: Jim Harris <jim.harris@gmail.com>
Subject: Re: CACHE_LINE_SIZE on x86
References: <CAJP=Hc_F+-RdD=XZ7ikBKVKE_XW88Y35Xw0bYE6gGURLPDOAWw@mail.gmail.com>
 <201210250918.00602.jhb@freebsd.org> <5089690A.8070503@networx.ch>
 <201210251732.31631.jhb@freebsd.org>
 <CAJP=Hc_98G=77gSO9hQ_knTedhNuXDErUt34=5vSPmux=tQR1g@mail.gmail.com>
 <CAJP=Hc8mVycfjWN7_V4VAAHf+0AiFozqcF4Shz26uh5oGiDxKQ@mail.gmail.com>
In-Reply-To: <CAJP=Hc8mVycfjWN7_V4VAAHf+0AiFozqcF4Shz26uh5oGiDxKQ@mail.gmail.com>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Cc: freebsd-arch@freebsd.org
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-arch>,
 <mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
 <mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 01 Nov 2012 14:45:01 -0000

On 01.11.2012 01:50, Jim Harris wrote:
>
>
> On Thu, Oct 25, 2012 at 2:40 PM, Jim Harris <jim.harris@gmail.com <mailto:jim.harris@gmail.com>> wrote:
>
>     On Thu, Oct 25, 2012 at 2:32 PM, John Baldwin <jhb@freebsd.org <mailto:jhb@freebsd.org>> wrote:
>      >
>      > It would be good to know though if there are performance benefits from
>      > avoiding sharing across paired lines in this manner.  Even if it has
>      > its own MOESI state, there might still be negative effects from sharing
>      > the pair.
>
>     On 2S, I do see further benefits by using 128 byte padding instead of
>     64.  On 1S, I see no difference.  I've been meaning to turn off
>     prefetching on my system to see if it has any effect in the 2S case -
>     I can give that a shot tomorrow.
>
>
> So tomorrow turned into next week, but I have some data finally.
>
> I've updated to HEAD from today, including all of the mtx_padalign changes.  I tested 64 v. 128 byte
> alignment on 2S amd64 (SNB Xeon).  My BIOS also has a knob to disable the adjacent line prefetching
> (MLC spatial prefetcher), so I ran both 64b and 128b against this specific prefetcher both enabled
> and disabled.
>
> MLC prefetcher enabled: 3-6% performance improvement, 1-5% decrease in CPU utilization by using 128b
> padding instead of 64b.

Just to be sure.  The numbers you show are just for the one location you've
converted to the new padded mutex and a particular test case?

-- 
Andre

> MLC prefetcher disabled: performance and CPU utilization differences are in the noise - anywhere
> from -0.2% to +0.5%.  The performanc here matches extremely closely (within 1%) with 128b padding
> and the MLC prefetcher enabled.
>
> I think it's safe to say that the 128b pad/alignment is worth keeping for multi-socket x86, and is
> most certainly due to the MLC spatial prefetcher.
>
> I still see no measurable differences with 64b v. 128b padding on 1S, but that's only testing with
> my benchmark.
>
> Thanks,
>
> -Jim
>