From owner-freebsd-arch@FreeBSD.ORG Thu Nov 1 14:45:00 2012 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id E91A0AA8 for ; Thu, 1 Nov 2012 14:45:00 +0000 (UTC) (envelope-from andre@freebsd.org) Received: from c00l3r.networx.ch (c00l3r.networx.ch [62.48.2.2]) by mx1.freebsd.org (Postfix) with ESMTP id 2DDCC8FC19 for ; Thu, 1 Nov 2012 14:45:00 +0000 (UTC) Received: (qmail 79469 invoked from network); 1 Nov 2012 16:21:21 -0000 Received: from c00l3r.networx.ch (HELO [127.0.0.1]) ([62.48.2.2]) (envelope-sender ) by c00l3r.networx.ch (qmail-ldap-1.03) with SMTP for ; 1 Nov 2012 16:21:21 -0000 Message-ID: <50928AE5.4010107@freebsd.org> Date: Thu, 01 Nov 2012 15:44:53 +0100 From: Andre Oppermann User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:16.0) Gecko/20121010 Thunderbird/16.0.1 MIME-Version: 1.0 To: Jim Harris Subject: Re: CACHE_LINE_SIZE on x86 References: <201210250918.00602.jhb@freebsd.org> <5089690A.8070503@networx.ch> <201210251732.31631.jhb@freebsd.org> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: freebsd-arch@freebsd.org X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 01 Nov 2012 14:45:01 -0000 On 01.11.2012 01:50, Jim Harris wrote: > > > On Thu, Oct 25, 2012 at 2:40 PM, Jim Harris > wrote: > > On Thu, Oct 25, 2012 at 2:32 PM, John Baldwin > wrote: > > > > It would be good to know though if there are performance benefits from > > avoiding sharing across paired lines in this manner. Even if it has > > its own MOESI state, there might still be negative effects from sharing > > the pair. > > On 2S, I do see further benefits by using 128 byte padding instead of > 64. On 1S, I see no difference. I've been meaning to turn off > prefetching on my system to see if it has any effect in the 2S case - > I can give that a shot tomorrow. > > > So tomorrow turned into next week, but I have some data finally. > > I've updated to HEAD from today, including all of the mtx_padalign changes. I tested 64 v. 128 byte > alignment on 2S amd64 (SNB Xeon). My BIOS also has a knob to disable the adjacent line prefetching > (MLC spatial prefetcher), so I ran both 64b and 128b against this specific prefetcher both enabled > and disabled. > > MLC prefetcher enabled: 3-6% performance improvement, 1-5% decrease in CPU utilization by using 128b > padding instead of 64b. Just to be sure. The numbers you show are just for the one location you've converted to the new padded mutex and a particular test case? -- Andre > MLC prefetcher disabled: performance and CPU utilization differences are in the noise - anywhere > from -0.2% to +0.5%. The performanc here matches extremely closely (within 1%) with 128b padding > and the MLC prefetcher enabled. > > I think it's safe to say that the 128b pad/alignment is worth keeping for multi-socket x86, and is > most certainly due to the MLC spatial prefetcher. > > I still see no measurable differences with 64b v. 128b padding on 1S, but that's only testing with > my benchmark. > > Thanks, > > -Jim >