From owner-freebsd-arch@FreeBSD.ORG  Fri Nov  2 02:47:44 2012
Return-Path: <owner-freebsd-arch@FreeBSD.ORG>
Delivered-To: freebsd-arch@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
 by hub.freebsd.org (Postfix) with ESMTP id 41D75FC3
 for <freebsd-arch@freebsd.org>; Fri,  2 Nov 2012 02:47:44 +0000 (UTC)
 (envelope-from lists@eitanadler.com)
Received: from mail-la0-f54.google.com (mail-la0-f54.google.com
 [209.85.215.54])
 by mx1.freebsd.org (Postfix) with ESMTP id A3ECA8FC14
 for <freebsd-arch@freebsd.org>; Fri,  2 Nov 2012 02:47:43 +0000 (UTC)
Received: by mail-la0-f54.google.com with SMTP id e12so2904315lag.13
 for <freebsd-arch@freebsd.org>; Thu, 01 Nov 2012 19:47:42 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=eitanadler.com; s=0xdeadbeef;
 h=mime-version:in-reply-to:references:from:date:message-id:subject:to
 :cc:content-type;
 bh=mOm91W9G4DSsRIQrzkymMMV0W8bTk+latT8BrtL8dC0=;
 b=rIcE306+GQrjFj0YpXjqWO/vXvpcbsBW7CzHZsxvXyT5LyuEu4miZfT9qp6dvYyxf9
 OekhPDEca0TqU8/xHnWfhG/+wu+boDaOvZcwl3OK4NHv8qhnpaw3uq4CTgwpvGlLBoGL
 fLHlhr+KNqlfEN8NPS/WnHh6WbhR6pD5Y3+aw=
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=google.com; s=20120113;
 h=mime-version:in-reply-to:references:from:date:message-id:subject:to
 :cc:content-type:x-gm-message-state;
 bh=mOm91W9G4DSsRIQrzkymMMV0W8bTk+latT8BrtL8dC0=;
 b=fEDKvNLgiVBd0puNJZ3sF1IDied783FdrImLRdCSs9+9sdQxSivbwCv/salTZBVdna
 Em3HodpTApR29SZ6j/WhA59K+BPAugiyiLHlRDkq2Pbgq4Lw7RA+VSNs6KRoljAEs/Gg
 DF5Biou4f3jj/7CXPXnJs4J4dzg4dSqukPAPhgy/gLc+4gjclXbBGv8SPz8ilGmQn0pa
 XwXhAW5PTAw9g++xXJmHwEwsPft4CeBkJalTtwJMbiwnpTew4wggtks8nWgfNJk/k+52
 176QltaiO5lUcMowH+ibZlvP6RMI094ax4hQDry2qAnHPjY6do7IuLYpY164NPG4yan6
 smHg==
Received: by 10.152.109.145 with SMTP id hs17mr354715lab.5.1351824462328; Thu,
 01 Nov 2012 19:47:42 -0700 (PDT)
MIME-Version: 1.0
Received: by 10.112.162.71 with HTTP; Thu, 1 Nov 2012 19:47:12 -0700 (PDT)
In-Reply-To: <CAJP=Hc_mEcO6wStbcuRp_3McUGhEp06nvKXh8QO+Q0x67KrM7w@mail.gmail.com>
References: <CAJP=Hc_F+-RdD=XZ7ikBKVKE_XW88Y35Xw0bYE6gGURLPDOAWw@mail.gmail.com>
 <201210250918.00602.jhb@freebsd.org> <5089690A.8070503@networx.ch>
 <201210251732.31631.jhb@freebsd.org>
 <CAJP=Hc_98G=77gSO9hQ_knTedhNuXDErUt34=5vSPmux=tQR1g@mail.gmail.com>
 <CAJP=Hc8mVycfjWN7_V4VAAHf+0AiFozqcF4Shz26uh5oGiDxKQ@mail.gmail.com>
 <50928AE5.4010107@freebsd.org>
 <CAJP=Hc_mEcO6wStbcuRp_3McUGhEp06nvKXh8QO+Q0x67KrM7w@mail.gmail.com>
From: Eitan Adler <lists@eitanadler.com>
Date: Thu, 1 Nov 2012 22:47:12 -0400
Message-ID: <CAF6rxgk4oUZLyBtsTkwr36NPR9zBmmRKe59QaAfvW13KEs2CNg@mail.gmail.com>
Subject: Re: CACHE_LINE_SIZE on x86
To: Jim Harris <jim.harris@gmail.com>
Content-Type: text/plain; charset=UTF-8
X-Gm-Message-State: ALoCoQmQSppRcdpqHrYQ9p8x06ACzib8DLmgoKXaG4unQbsloMQFPnHCtRNy9TwwXM7lfGxLszUA
Cc: Attilio Rao <attilio@freebsd.org>, Andre Oppermann <andre@freebsd.org>,
 freebsd-arch@freebsd.org
X-BeenThere: freebsd-arch@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Discussion related to FreeBSD architecture <freebsd-arch.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-arch>,
 <mailto:freebsd-arch-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-arch>
List-Post: <mailto:freebsd-arch@freebsd.org>
List-Help: <mailto:freebsd-arch-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-arch>,
 <mailto:freebsd-arch-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 02 Nov 2012 02:47:44 -0000

On 1 November 2012 14:36, Jim Harris <jim.harris@gmail.com> wrote:
> On Thu, Nov 1, 2012 at 7:44 AM, Andre Oppermann <andre@freebsd.org> wrote:
>
>> On 01.11.2012 01:50, Jim Harris wrote:
>>
>>>
>>>
>>> On Thu, Oct 25, 2012 at 2:40 PM, Jim Harris <jim.harris@gmail.com<mailto:
>>> jim.harris@gmail.com>> wrote:
>>>
>>>
>>>     On Thu, Oct 25, 2012 at 2:32 PM, John Baldwin <jhb@freebsd.org<mailto:
>>> jhb@freebsd.org>> wrote:
>>>      >
>>>      > It would be good to know though if there are performance benefits
>>> from
>>>      > avoiding sharing across paired lines in this manner.  Even if it
>>> has
>>>      > its own MOESI state, there might still be negative effects from
>>> sharing
>>>      > the pair.
>>>
>>>     On 2S, I do see further benefits by using 128 byte padding instead of
>>>     64.  On 1S, I see no difference.  I've been meaning to turn off
>>>     prefetching on my system to see if it has any effect in the 2S case -
>>>     I can give that a shot tomorrow.
>>>
>>>
>>> So tomorrow turned into next week, but I have some data finally.
>>>
>>> I've updated to HEAD from today, including all of the mtx_padalign
>>> changes.  I tested 64 v. 128 byte
>>> alignment on 2S amd64 (SNB Xeon).  My BIOS also has a knob to disable the
>>> adjacent line prefetching
>>> (MLC spatial prefetcher), so I ran both 64b and 128b against this
>>> specific prefetcher both enabled
>>> and disabled.
>>>
>>> MLC prefetcher enabled: 3-6% performance improvement, 1-5% decrease in
>>> CPU utilization by using 128b
>>> padding instead of 64b.
>>>
>>
>> Just to be sure.  The numbers you show are just for the one location you've
>> converted to the new padded mutex and a particular test case?
>>
>
> There are two locations actually - the struct tdq lock in the ULE
> scheduler, and the callout_cpu lock in kern_timeout.c.
>
> And yes, I've been only running a custom benchmark I developed here to help
> to try to uncover some of these areas of spinlock contention.  It was
> originally used for NVMe driver performance testing, but has been helpful
> in uncovering some other issues outside of the NVMe driver itself (such as
> these contended spinlocks).  It spawns a large number of kernel threads,
> each of which submits an I/O and then sleeps until it is woken by the
> interrupt thread when the I/O completes.  It stresses the scheduler and
> also callout since I start and stop a timer for each I/O.
>
> I think the only thing proves is that there is benefit to having x86
> CACHE_LINE_SIZE still set to 128.

Does this benchmark simulate reality or does padding the locks only
help on this specific benchmark?


-- 
Eitan Adler