From owner-freebsd-arch@FreeBSD.ORG Fri Nov 2 02:47:44 2012 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 41D75FC3 for ; Fri, 2 Nov 2012 02:47:44 +0000 (UTC) (envelope-from lists@eitanadler.com) Received: from mail-la0-f54.google.com (mail-la0-f54.google.com [209.85.215.54]) by mx1.freebsd.org (Postfix) with ESMTP id A3ECA8FC14 for ; Fri, 2 Nov 2012 02:47:43 +0000 (UTC) Received: by mail-la0-f54.google.com with SMTP id e12so2904315lag.13 for ; Thu, 01 Nov 2012 19:47:42 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=eitanadler.com; s=0xdeadbeef; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc:content-type; bh=mOm91W9G4DSsRIQrzkymMMV0W8bTk+latT8BrtL8dC0=; b=rIcE306+GQrjFj0YpXjqWO/vXvpcbsBW7CzHZsxvXyT5LyuEu4miZfT9qp6dvYyxf9 OekhPDEca0TqU8/xHnWfhG/+wu+boDaOvZcwl3OK4NHv8qhnpaw3uq4CTgwpvGlLBoGL fLHlhr+KNqlfEN8NPS/WnHh6WbhR6pD5Y3+aw= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc:content-type:x-gm-message-state; bh=mOm91W9G4DSsRIQrzkymMMV0W8bTk+latT8BrtL8dC0=; b=fEDKvNLgiVBd0puNJZ3sF1IDied783FdrImLRdCSs9+9sdQxSivbwCv/salTZBVdna Em3HodpTApR29SZ6j/WhA59K+BPAugiyiLHlRDkq2Pbgq4Lw7RA+VSNs6KRoljAEs/Gg DF5Biou4f3jj/7CXPXnJs4J4dzg4dSqukPAPhgy/gLc+4gjclXbBGv8SPz8ilGmQn0pa XwXhAW5PTAw9g++xXJmHwEwsPft4CeBkJalTtwJMbiwnpTew4wggtks8nWgfNJk/k+52 176QltaiO5lUcMowH+ibZlvP6RMI094ax4hQDry2qAnHPjY6do7IuLYpY164NPG4yan6 smHg== Received: by 10.152.109.145 with SMTP id hs17mr354715lab.5.1351824462328; Thu, 01 Nov 2012 19:47:42 -0700 (PDT) MIME-Version: 1.0 Received: by 10.112.162.71 with HTTP; Thu, 1 Nov 2012 19:47:12 -0700 (PDT) In-Reply-To: References: <201210250918.00602.jhb@freebsd.org> <5089690A.8070503@networx.ch> <201210251732.31631.jhb@freebsd.org> <50928AE5.4010107@freebsd.org> From: Eitan Adler Date: Thu, 1 Nov 2012 22:47:12 -0400 Message-ID: Subject: Re: CACHE_LINE_SIZE on x86 To: Jim Harris Content-Type: text/plain; charset=UTF-8 X-Gm-Message-State: ALoCoQmQSppRcdpqHrYQ9p8x06ACzib8DLmgoKXaG4unQbsloMQFPnHCtRNy9TwwXM7lfGxLszUA Cc: Attilio Rao , Andre Oppermann , freebsd-arch@freebsd.org X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 02 Nov 2012 02:47:44 -0000 On 1 November 2012 14:36, Jim Harris wrote: > On Thu, Nov 1, 2012 at 7:44 AM, Andre Oppermann wrote: > >> On 01.11.2012 01:50, Jim Harris wrote: >> >>> >>> >>> On Thu, Oct 25, 2012 at 2:40 PM, Jim Harris >> jim.harris@gmail.com>> wrote: >>> >>> >>> On Thu, Oct 25, 2012 at 2:32 PM, John Baldwin >> jhb@freebsd.org>> wrote: >>> > >>> > It would be good to know though if there are performance benefits >>> from >>> > avoiding sharing across paired lines in this manner. Even if it >>> has >>> > its own MOESI state, there might still be negative effects from >>> sharing >>> > the pair. >>> >>> On 2S, I do see further benefits by using 128 byte padding instead of >>> 64. On 1S, I see no difference. I've been meaning to turn off >>> prefetching on my system to see if it has any effect in the 2S case - >>> I can give that a shot tomorrow. >>> >>> >>> So tomorrow turned into next week, but I have some data finally. >>> >>> I've updated to HEAD from today, including all of the mtx_padalign >>> changes. I tested 64 v. 128 byte >>> alignment on 2S amd64 (SNB Xeon). My BIOS also has a knob to disable the >>> adjacent line prefetching >>> (MLC spatial prefetcher), so I ran both 64b and 128b against this >>> specific prefetcher both enabled >>> and disabled. >>> >>> MLC prefetcher enabled: 3-6% performance improvement, 1-5% decrease in >>> CPU utilization by using 128b >>> padding instead of 64b. >>> >> >> Just to be sure. The numbers you show are just for the one location you've >> converted to the new padded mutex and a particular test case? >> > > There are two locations actually - the struct tdq lock in the ULE > scheduler, and the callout_cpu lock in kern_timeout.c. > > And yes, I've been only running a custom benchmark I developed here to help > to try to uncover some of these areas of spinlock contention. It was > originally used for NVMe driver performance testing, but has been helpful > in uncovering some other issues outside of the NVMe driver itself (such as > these contended spinlocks). It spawns a large number of kernel threads, > each of which submits an I/O and then sleeps until it is woken by the > interrupt thread when the I/O completes. It stresses the scheduler and > also callout since I start and stop a timer for each I/O. > > I think the only thing proves is that there is benefit to having x86 > CACHE_LINE_SIZE still set to 128. Does this benchmark simulate reality or does padding the locks only help on this specific benchmark? -- Eitan Adler