From owner-freebsd-arch@FreeBSD.ORG  Thu Nov  1 18:36:15 2012
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
 by hub.freebsd.org (Postfix) with ESMTP id 92240DD6;
 Thu,  1 Nov 2012 18:36:15 +0000 (UTC)
 (envelope-from jim.harris@gmail.com)
Received: from mail-vb0-f54.google.com (mail-vb0-f54.google.com
 [209.85.212.54])
 by mx1.freebsd.org (Postfix) with ESMTP id 016578FC08;
 Thu,  1 Nov 2012 18:36:14 +0000 (UTC)
Received: by mail-vb0-f54.google.com with SMTP id l1so3891795vba.13
 for <multiple recipients>; Thu, 01 Nov 2012 11:36:14 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113;
 h=mime-version:in-reply-to:references:date:message-id:subject:from:to
 :cc:content-type;
 bh=fA2mkPhEPFClBZ6DyzvaxAVbf3nQlVPJqv0WaBiGHyA=;
 b=Fv1nHpSabKszh/JIDwrr4g4Kvp+e1/NV8M1exm0iRfu1Z+DkfoECe8udQ5r0WOUxj0
 0JW1zw3bCfF3JVJnxKxdcKz1AzxUeuCCwcA8ZPl6UnBrwvHwVLqPiojxILT3n7SpJ1T9
 Q0Zv36w31Zz7HNwTsled+K0M4fG+BBigVfs5kJEwiv/qIjzuua7aCVkTgi2yVErpsbCl
 xP+k1wxjXdVzAnyjL4G+Gta8T+2QW4Dz5j4Cb5jqIlVSMY1OuDDIxfQiTjjmtcHjx2DK
 rxK9SH5k/ppFiGr5g4noG0UlKZ+Xm/+E3Ed9dSmnyg6DY0H7NN1j0Ne9bIAfygmkR2Nz
 ozoA==
MIME-Version: 1.0
Received: by 10.220.226.67 with SMTP id iv3mr23829385vcb.57.1351794974098;
 Thu, 01 Nov 2012 11:36:14 -0700 (PDT)
Received: by 10.58.225.2 with HTTP; Thu, 1 Nov 2012 11:36:13 -0700 (PDT)
In-Reply-To: <50928AE5.4010107@freebsd.org>
References: <CAJP=Hc_F+-RdD=XZ7ikBKVKE_XW88Y35Xw0bYE6gGURLPDOAWw@mail.gmail.com>
 <201210250918.00602.jhb@freebsd.org> <5089690A.8070503@networx.ch>
 <201210251732.31631.jhb@freebsd.org>
 <CAJP=Hc_98G=77gSO9hQ_knTedhNuXDErUt34=5vSPmux=tQR1g@mail.gmail.com>
 <CAJP=Hc8mVycfjWN7_V4VAAHf+0AiFozqcF4Shz26uh5oGiDxKQ@mail.gmail.com>
 <50928AE5.4010107@freebsd.org>
Date: Thu, 1 Nov 2012 11:36:13 -0700
Message-ID: <CAJP=Hc_mEcO6wStbcuRp_3McUGhEp06nvKXh8QO+Q0x67KrM7w@mail.gmail.com>
Subject: Re: CACHE_LINE_SIZE on x86
From: Jim Harris <jim.harris@gmail.com>
To: Andre Oppermann <andre@freebsd.org>
Content-Type: text/plain; charset=ISO-8859-1
X-Content-Filtered-By: Mailman/MimeDel 2.1.14
Cc: Attilio Rao <attilio@freebsd.org>, freebsd-arch@freebsd.org
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-arch>,
 <mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
 <mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 01 Nov 2012 18:36:15 -0000

On Thu, Nov 1, 2012 at 7:44 AM, Andre Oppermann <andre@freebsd.org> wrote:

> On 01.11.2012 01:50, Jim Harris wrote:
>
>>
>>
>> On Thu, Oct 25, 2012 at 2:40 PM, Jim Harris <jim.harris@gmail.com<mailto:
>> jim.harris@gmail.com>> wrote:
>>
>>
>>     On Thu, Oct 25, 2012 at 2:32 PM, John Baldwin <jhb@freebsd.org<mailto:
>> jhb@freebsd.org>> wrote:
>>      >
>>      > It would be good to know though if there are performance benefits
>> from
>>      > avoiding sharing across paired lines in this manner.  Even if it
>> has
>>      > its own MOESI state, there might still be negative effects from
>> sharing
>>      > the pair.
>>
>>     On 2S, I do see further benefits by using 128 byte padding instead of
>>     64.  On 1S, I see no difference.  I've been meaning to turn off
>>     prefetching on my system to see if it has any effect in the 2S case -
>>     I can give that a shot tomorrow.
>>
>>
>> So tomorrow turned into next week, but I have some data finally.
>>
>> I've updated to HEAD from today, including all of the mtx_padalign
>> changes.  I tested 64 v. 128 byte
>> alignment on 2S amd64 (SNB Xeon).  My BIOS also has a knob to disable the
>> adjacent line prefetching
>> (MLC spatial prefetcher), so I ran both 64b and 128b against this
>> specific prefetcher both enabled
>> and disabled.
>>
>> MLC prefetcher enabled: 3-6% performance improvement, 1-5% decrease in
>> CPU utilization by using 128b
>> padding instead of 64b.
>>
>
> Just to be sure.  The numbers you show are just for the one location you've
> converted to the new padded mutex and a particular test case?
>

There are two locations actually - the struct tdq lock in the ULE
scheduler, and the callout_cpu lock in kern_timeout.c.

And yes, I've been only running a custom benchmark I developed here to help
to try to uncover some of these areas of spinlock contention.  It was
originally used for NVMe driver performance testing, but has been helpful
in uncovering some other issues outside of the NVMe driver itself (such as
these contended spinlocks).  It spawns a large number of kernel threads,
each of which submits an I/O and then sleeps until it is woken by the
interrupt thread when the I/O completes.  It stresses the scheduler and
also callout since I start and stop a timer for each I/O.

I think the only thing proves is that there is benefit to having x86
CACHE_LINE_SIZE still set to 128.

Thanks,

-Jim