From owner-svn-src-all@freebsd.org  Fri Jul  6 22:58:54 2018
Return-Path: <owner-svn-src-all@freebsd.org>
Delivered-To: svn-src-all@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 279651038A0C
 for <svn-src-all@mailman.ysv.freebsd.org>;
 Fri,  6 Jul 2018 22:58:54 +0000 (UTC)
 (envelope-from jroberson@jroberson.net)
Received: from mail-pf0-x241.google.com (mail-pf0-x241.google.com
 [IPv6:2607:f8b0:400e:c00::241])
 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
 (Client CN "smtp.gmail.com",
 Issuer "Google Internet Authority G2" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id 94CC873ABA
 for <svn-src-all@freebsd.org>; Fri,  6 Jul 2018 22:58:53 +0000 (UTC)
 (envelope-from jroberson@jroberson.net)
Received: by mail-pf0-x241.google.com with SMTP id v9-v6so9498483pff.9
 for <svn-src-all@freebsd.org>; Fri, 06 Jul 2018 15:58:53 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=jroberson-net.20150623.gappssmtp.com; s=20150623;
 h=date:from:to:cc:subject:in-reply-to:message-id:references
 :user-agent:mime-version;
 bh=IxBNOVlnfd6MyMH/P/ffqJsqFyhcKsMkadqzw7sKudY=;
 b=BUue1FGvePEiCrrPExQeF2ERcJF85p9avBb/RA0z1jLHgyjdbDHsbtRMyhYYB+hHlW
 CxckQ1fKrRpwpbAs/pWoJiuiHQHDO+wgaXlfZU2jQsNBho4ofYHMy6BO4wPtMDEUxu5Q
 6HYEs3yhxdzXdvvRCgfglTnqlaFAWzTeBrHyOcmxRMVyr+rSL3Khd0rbnmdMNr+vj5x9
 iaRhtVw9SXBJdLW+0xa3zoJlne4H/9Q9vdpXOiwXpgUm5eK++c9TQV3RjARNqwQHhO+c
 Wm6tR7ocaZG24szdYlv9Ij3dbRCox7UOLgwzLtuyybPgfiV2pJxiDKweJVgJwiAA7icF
 6dgA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20161025;
 h=x-gm-message-state:date:from:to:cc:subject:in-reply-to:message-id
 :references:user-agent:mime-version;
 bh=IxBNOVlnfd6MyMH/P/ffqJsqFyhcKsMkadqzw7sKudY=;
 b=XtQm4gEgzeditKwbfIz9h41TzsXDSedszARK5okNS9d9tqZGQ6d+Z4B/k1xQ1uovHR
 9ipvesmkZyJwdygl1PMQ7cjyVxG40RgldYpwVq3C7PqBN1FaYPFw7jtRLrcYqW3P9BpJ
 qgyvwv/cwB7dgCPdk93K8tg0iwjkBD2P2QJFuqZMvf8mrV79ChSSgdskx+gAlb0mqLo9
 QmQNfagy/tn96f75HWtYgwdPHQN7zW1MNwtcCky+BgXYDOJJEOBCTw3Jpc45vtTUXYcK
 pbIt4HPzjSTaAJd8H73n+PgyUszMQd5wwwG0tZRiGQUTNxKh5F3LcSRQJmEilAKtvdd1
 w6Ug==
X-Gm-Message-State: APt69E3wSHEdwoxCPRpiR8zBhpSaqsqEUkuoI3bD/DThIm9SzCbiu2UB
 UomAYn4Z59KcIPIitblxr6Yd2a5/kdc=
X-Google-Smtp-Source: AAOMgpeTONkeiEpLnMTkI4/ZOj8wkXeSvzcCxgifA9QwPf66Vg3HTAjbixoDVp03G21321A/7Xr3Yw==
X-Received: by 2002:a63:5c10:: with SMTP id
 q16-v6mr10858278pgb.452.1530917932605; 
 Fri, 06 Jul 2018 15:58:52 -0700 (PDT)
Received: from rrcs-66-91-135-210.west.biz.rr.com
 (rrcs-66-91-135-210.west.biz.rr.com. [66.91.135.210])
 by smtp.gmail.com with ESMTPSA id u17-v6sm41338995pfa.176.2018.07.06.15.58.50
 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128);
 Fri, 06 Jul 2018 15:58:51 -0700 (PDT)
Date: Fri, 6 Jul 2018 12:56:11 -1000 (HST)
From: Jeff Roberson <jroberson@jroberson.net>
X-X-Sender: jroberson@desktop
To: "Rodney W. Grimes" <rgrimes@freebsd.org>
cc: Warner Losh <imp@bsdimp.com>, Hans Petter Selasky <hselasky@freebsd.org>, 
 src-committers <src-committers@freebsd.org>, svn-src-all@freebsd.org, 
 svn-src-head@freebsd.org
Subject: Re: svn commit: r336025 - in head/sys: amd64/include i386/include
In-Reply-To: <201807061809.w66I9RVR053596@pdx.rh.CN85.dnsmgr.net>
Message-ID: <alpine.BSF.2.21.999.1807061251230.2934@desktop>
References: <201807061809.w66I9RVR053596@pdx.rh.CN85.dnsmgr.net>
User-Agent: Alpine 2.21.999 (BSF 260 2018-02-26)
MIME-Version: 1.0
Content-Type: multipart/mixed;
 boundary="2547152148-1705954291-1530917773=:2934"
X-BeenThere: svn-src-all@freebsd.org
X-Mailman-Version: 2.1.27
Precedence: list
List-Id: "SVN commit messages for the entire src tree \(except for &quot;
 user&quot; and &quot; projects&quot; \)" <svn-src-all.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/svn-src-all>,
 <mailto:svn-src-all-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/svn-src-all/>
List-Post: <mailto:svn-src-all@freebsd.org>
List-Help: <mailto:svn-src-all-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/svn-src-all>,
 <mailto:svn-src-all-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 06 Jul 2018 22:58:54 -0000

  This message is in MIME format.  The first part should be readable text,
  while the remaining parts are likely unreadable without MIME-aware tools.

--2547152148-1705954291-1530917773=:2934
Content-Type: text/plain; charset=US-ASCII; format=flowed

On Fri, 6 Jul 2018, Rodney W. Grimes wrote:

>> On Fri, Jul 6, 2018, 12:27 PM Rodney W. Grimes <
>> freebsd@pdx.rh.cn85.dnsmgr.net> wrote:
>>
>>>> On Fri, Jul 6, 2018 at 9:52 AM, Rodney W. Grimes <
>>>> freebsd@pdx.rh.cn85.dnsmgr.net> wrote:
>>>>
>>>>>> On Fri, Jul 6, 2018 at 9:32 AM, Rodney W. Grimes <
>>>>>> freebsd@pdx.rh.cn85.dnsmgr.net> wrote:
>>>>>>
>>>>>>>> Author: hselasky
>>>>>>>> Date: Fri Jul  6 10:13:42 2018
>>>>>>>> New Revision: 336025
>>>>>>>> URL: https://svnweb.freebsd.org/changeset/base/336025
>>>>>>>>
>>>>>>>> Log:
>>>>>>>>   Make sure kernel modules built by default are portable between
>>> UP
>>>>> and
>>>>>>>>   SMP systems by extending defined(SMP) to include
>>>>> defined(KLD_MODULE).
>>>>>>>>
>>>>>>>>   This is a regression issue after r335873 .
>>>>>>>>
>>>>>>>>   Discussed with:             mmacy@
>>>>>>>>   Sponsored by:               Mellanox Technologies
>>>>>>>
>>>>>>> Though this fixes the issue, it also means that now when
>>>>>>> anyone intentionally builds a UP kernel with modules
>>>>>>> they are getting SMP support in the modules and I am
>>>>>>> not sure they would want that.  I know I don't.
>>>>>>>
>>>>>>
>>>>>>
>>>>>> On UP systems, these additional opcodes are harmless. They take a few
>>>>> extra
>>>>>> cycles (since they lock an uncontested bus) and add a couple extra
>>> memory
>>>>>> barriers (which will be NOPs). On MP systems, atomics now work by
>>>>> default.
>>>>>> Had we not defaulted like this, all modules built outside of a kernel
>>>>> build
>>>>>> env would have broken atomics. Given that (a) the overwhelming
>>> majority
>>>>>> (99% or more) is SMP and (b) the MP code merely adds a few cycles to
>>>>> what's
>>>>>> already a not-too-expensive operation, this was the right choice.
>>>>>>
>>>>>> It simply doesn't matter for systems that are relevant to the project
>>>>>> today. While one could try to optimize this a little (for example, by
>>>>>> having SMP defined to be 0 or 1, say, and changing all the ifdef SMP
>>> to
>>>>> if
>>>>>> (defined(SMP) && SMP != 0)), it's likely not going to matter enough
>>> for
>>>>>> anybody to make the effort. UP on x86 is simply not relevant enough
>>> to
>>>>>> optimize for it. Even in VMs, people run SMP kernels typically even
>>> when
>>>>>> they just allocate one CPU to the VM.
>>>>>>
>>>>>> So while we still support the UP config, and we'll let people build
>>>>>> optimized kernels for x86, we've flipped the switch from pessimized
>>> for
>>>>> SMP
>>>>>> modules to pessimized for UP modules, which seems like quite the
>>>>> reasonable
>>>>>> trade-off.
>>>>>>
>>>>>> Were it practical to do so, I'd suggest de-orbiting UP on x86.
>>> However,
>>>>>> it's a lot of work for not much benefit and we'd need to invent much
>>>>> crazy
>>>>>> to get there.
>>>>>
>>>>> Trivial to fix this with
>>>>> +#if defined(SMP) || !defined(_KERNEL) || defined(KLD_MODULE) ||
>>>>> !defined(KLD_UP_MODULES)
>>>>
>>>>
>>>> Nope. Not so trivial. Who defines KLD_UP_MODULES?
>>>
>>> Call it SMP_KLD_MODULES, and it gets defined the same place SMP does.
>>>
>>
>> Not so simple. SMP is defined in the config file, and winds up in one of
> No problem, that is where I would be defining this anyway, or in the
> latest case removing it and SMP for my UP kernel build.
>
>> the option files. It will be absent for stand alone builds,
> I am ok with that.  And it would be reasonable to default to SMP.
>
>> though. These
>> change tweak the default yo be inlined and to include the sequence that
>> works everywhere.
>>
>>>
>>>> And really, it's absolutely not worth it unless someone shows up with
>>>> numbers to show the old 'function call to optimal routine' is actually
>>>> faster than the new 'inline to slightly unoptimal code'. Since I think
>>> the
>>>> function call overhead is larger than the pessmizations, I'm not sure
>>> what
>>>> the fuss is about.
>>>
>>> I have no issues with the SMP converting from function calls to
>>> inline locks, I just want to retain the exact same code I had
>>> before any of these changes, and that was A UP built system
>>> without any SMP locking.  Is it too much to ask to keep what
>>> already worked?
>>>
>>
>> This doesn't enable or disable locks in the muted sense. It just changes
>> the atomic ops for the kernel from a function call to an inlined function.
>> The inlining is more efficient than the call, even with the overhead added
>> by always inlining the same stuff. It still is faster than before.
>>
>> And userland has done this forever...
>>
>> So I honestly think even UP builds are better off, even if it's not hyper
>> optimized for UP. The lock instruction prefix is minimal overhead (a cycle
>> I think).
>
> I do not believe, and Bruce seems to have evidence, that LOCK is not
> a one cycle cost.  And in my head I know that it can not be that
> simple as it causes lots of very special things to happen in the
> pipeline to ensure you are locked.

It is definitely not one cycle.  It is a couple dozen depending on the 
machine.  You can benchmark yourself with the attached patch.

Here's two atomics in a loop with and without lock on a Zen:
desktop# ./nonatomic
0 8838: 8 ns
desktop# ./atomic
0 32887: 32 ns

This is at ~3ghz so three instructions per-ns.  Cut this in half for 
per-atomic cost. 12ns of overhead or 36 cycles.  One of them is a fetchadd 
so there is extra cost associated with that.  This is also worst case 
because the machine can't execute anything other than the atomics.  If 
there were other instructions that didn't operate on overlapping memory 
addresses they could run in parallel.

Jeff

>
>> This is different than the mutexes we optimize for the UP cases
>> (and which aren't affected by this change). It's really not a big deal.
>
> CPU's are not getting any faster, cycles are cycles, and I think we
> should at least investigate further before we just start making
> assumptions about the lock prefix being a 1 cycle cheap thing to
> do.
>
>>
>>>>>
>>>>>>>> Modified:
>>>>>>>>   head/sys/amd64/include/atomic.h
>>>>>>>>   head/sys/i386/include/atomic.h
>>>>>>>>
>>>>>>>> Modified: head/sys/amd64/include/atomic.h
>>>>>>>> ============================================================
>>>>>>> ==================
>>>>>>>> --- head/sys/amd64/include/atomic.h   Fri Jul  6 10:10:00 2018
>>>>>>> (r336024)
>>>>>>>> +++ head/sys/amd64/include/atomic.h   Fri Jul  6 10:13:42 2018
>>>>>>> (r336025)
>>>>>>>> @@ -132,7 +132,7 @@ void              atomic_store_rel_##TYPE(
>>>>> volatile
>>>>>>> u_##TYPE *p, u_
>>>>>>>>   * For userland, always use lock prefixes so that the binaries
>>> will
>>>>> run
>>>>>>>>   * on both SMP and !SMP systems.
>>>>>>>>   */
>>>>>>>> -#if defined(SMP) || !defined(_KERNEL)
>>>>>>>> +#if defined(SMP) || !defined(_KERNEL) || defined(KLD_MODULE)
>>>>>>>>  #define      MPLOCKED        "lock ; "
>>>>>>>>  #else
>>>>>>>>  #define      MPLOCKED
>>>>>>>> @@ -354,7 +354,7 @@ atomic_testandclear_long(volatile u_long *p,
>>>>> u_int
>>>>>>> v)
>>>>>>>>   */
>>>>>>>>  #define      OFFSETOF_MONITORBUF     0x100
>>>>>>>>
>>>>>>>> -#if defined(SMP)
>>>>>>>> +#if defined(SMP) || defined(KLD_MODULE)
>>>>>>>>  static __inline void
>>>>>>>>  __storeload_barrier(void)
>>>>>>>>  {
>>>>>>>>
>>>>>>>> Modified: head/sys/i386/include/atomic.h
>>>>>>>> ============================================================
>>>>>>> ==================
>>>>>>>> --- head/sys/i386/include/atomic.h    Fri Jul  6 10:10:00 2018
>>>>>>> (r336024)
>>>>>>>> +++ head/sys/i386/include/atomic.h    Fri Jul  6 10:13:42 2018
>>>>>>> (r336025)
>>>>>>>> @@ -143,7 +143,7 @@ void              atomic_subtract_64(volatile
>>>>>>> uint64_t *, uint64_t
>>>>>>>>   * For userland, always use lock prefixes so that the binaries
>>> will
>>>>> run
>>>>>>>>   * on both SMP and !SMP systems.
>>>>>>>>   */
>>>>>>>> -#if defined(SMP) || !defined(_KERNEL)
>>>>>>>> +#if defined(SMP) || !defined(_KERNEL) || defined(KLD_MODULE)
>>>>>>>>  #define      MPLOCKED        "lock ; "
>>>>>>>>  #else
>>>>>>>>  #define      MPLOCKED
>>>>>>>> @@ -302,7 +302,7 @@ atomic_testandclear_int(volatile u_int *p,
>>>>> u_int v)
>>>>>>>>   */
>>>>>>>>
>>>>>>>>  #if defined(_KERNEL)
>>>>>>>> -#if defined(SMP)
>>>>>>>> +#if defined(SMP) || defined(KLD_MODULE)
>>>>>>>>  #define      __storeload_barrier()   __mbk()
>>>>>>>>  #else /* _KERNEL && UP */
>>>>>>>>  #define      __storeload_barrier()   __compiler_membar()
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Rod Grimes
>>>>>>> rgrimes@freebsd.org
>>>>>>>
>>>>>>>
>>>>>
>>>>> --
>>>>> Rod Grimes
>>>>> rgrimes@freebsd.org
>>>>>
>>>
>>> --
>>> Rod Grimes
>>> rgrimes@freebsd.org
>>>
>
> -- 
> Rod Grimes                                                 rgrimes@freebsd.org
>
--2547152148-1705954291-1530917773=:2934
Content-Type: text/plain; charset=US-ASCII; name=atomic.c
Content-Transfer-Encoding: BASE64
Content-ID: <alpine.BSF.2.21.999.1807061256110.2934@desktop>
Content-Description: 
Content-Disposition: attachment; filename=atomic.c

I2luY2x1ZGUgPHN0ZGlvLmg+DQojaW5jbHVkZSA8c3RkbGliLmg+DQojaW5j
bHVkZSA8c3lzL3RpbWUuaD4NCiNpbmNsdWRlIDxzeXMvdHlwZXMuaD4NCg0K
I2RlZmluZQlJVEVSCTEwMDAwMDANCg0Kc3RhdGljIF9faW5saW5lIHVfaW50
DQphdG9taWNfZmV0Y2hhZGRfaW50KHZvbGF0aWxlIHVfaW50ICpwLCB1X2lu
dCB2KQ0Kew0KDQogICAgICAgIF9fYXNtIF9fdm9sYXRpbGUoDQogICAgICAg
ICIgICAgICAgbG9jayB4YWRkbCAgICUwLCUxIDsgICAgICAgICAiDQogICAg
ICAgICIjIGF0b21pY19mZXRjaGFkZF9pbnQiDQogICAgICAgIDogIityIiAo
diksICAgICAgICAgICAgICAgICAgICAgLyogMCAqLw0KICAgICAgICAgICIr
bSIgKCpwKSAgICAgICAgICAgICAgICAgICAgIC8qIDEgKi8NCiAgICAgICAg
OiA6ICJjYyIpOw0KICAgICAgICByZXR1cm4gKHYpOw0KfQ0KDQpzdGF0aWMg
X19pbmxpbmUgdm9pZA0KYXRvbWljX2FkZF9pbnQodm9sYXRpbGUgdV9pbnQg
KnAsIHVfaW50IHYpDQp7DQoNCiAgICAgICAgX19hc20gX192b2xhdGlsZSgN
CiAgICAgICAgIiAgICAgICBsb2NrIGFkZGwgICAlMCwlMSA7ICAgICAgICAg
Ig0KICAgICAgICA6ICIrciIgKHYpLCAgICAgICAgICAgICAgICAgICAgIC8q
IDAgKi8NCiAgICAgICAgICAiK20iICgqcCkgICAgICAgICAgICAgICAgICAg
ICAvKiAxICovDQogICAgICAgIDogOiAiY2MiKTsNCn0NCg0KaW50DQptYWlu
KGludCBhcmdjLCBjaGFyICoqYXJndikNCnsNCglzdHJ1Y3QgdGltZXZhbCBz
dGFydCwgZW5kOw0KCXZvbGF0aWxlIGludCBmb287DQoJdWludDY0X3QgbnM7
DQoJaW50IGk7DQoNCglnZXR0aW1lb2ZkYXkoJnN0YXJ0LCBOVUxMKTsNCglm
b3IgKGkgPSAwOyBpIDwgSVRFUjsgaSsrKSB7DQoJCWF0b21pY19hZGRfaW50
KCZmb28sIDEpOw0KCQlhdG9taWNfZmV0Y2hhZGRfaW50KCZmb28sIC0xKTsN
Cgl9DQoJZ2V0dGltZW9mZGF5KCZlbmQsIE5VTEwpOw0KCWVuZC50dl9zZWMg
LT0gc3RhcnQudHZfc2VjOw0KCWVuZC50dl91c2VjIC09IHN0YXJ0LnR2X3Vz
ZWM7DQoJbnMgPSBlbmQudHZfdXNlYyAvIChJVEVSIC8gMTAwMCk7DQoJcHJp
bnRmKCIlZCAlZDogJWQgbnNcbiIsIGVuZC50dl9zZWMsIGVuZC50dl91c2Vj
LCBucyk7DQp9DQo=

--2547152148-1705954291-1530917773=:2934--