From owner-svn-src-head@freebsd.org  Fri Jul  6 17:55:48 2018
Return-Path: <owner-svn-src-head@freebsd.org>
Delivered-To: svn-src-head@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id D8FC31040576;
 Fri,  6 Jul 2018 17:55:48 +0000 (UTC)
 (envelope-from brde@optusnet.com.au)
Received: from mail106.syd.optusnet.com.au (mail106.syd.optusnet.com.au
 [211.29.132.42])
 by mx1.freebsd.org (Postfix) with ESMTP id 53DB68E785;
 Fri,  6 Jul 2018 17:55:47 +0000 (UTC)
 (envelope-from brde@optusnet.com.au)
Received: from [192.168.0.102] (c110-21-101-228.carlnfd1.nsw.optusnet.com.au
 [110.21.101.228])
 by mail106.syd.optusnet.com.au (Postfix) with ESMTPS id 997443CA3B1;
 Sat,  7 Jul 2018 03:55:36 +1000 (AEST)
Date: Sat, 7 Jul 2018 03:55:35 +1000 (EST)
From: Bruce Evans <brde@optusnet.com.au>
X-X-Sender: bde@besplex.bde.org
To: John Baldwin <jhb@freebsd.org>
cc: rgrimes@freebsd.org, Warner Losh <imp@bsdimp.com>, 
 Hans Petter Selasky <hselasky@freebsd.org>, 
 src-committers <src-committers@freebsd.org>, svn-src-all@freebsd.org, 
 svn-src-head@freebsd.org
Subject: Re: svn commit: r336025 - in head/sys: amd64/include i386/include
In-Reply-To: <1f87b7ba-3b59-e710-00b0-91a4b0e4e5b4@FreeBSD.org>
Message-ID: <20180707031245.J2611@besplex.bde.org>
References: <201807061552.w66Fq0FX052931@pdx.rh.CN85.dnsmgr.net>
 <1f87b7ba-3b59-e710-00b0-91a4b0e4e5b4@FreeBSD.org>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
X-Optus-CM-Score: 0
X-Optus-CM-Analysis: v=2.2 cv=I9sVfJog c=1 sm=1 tr=0
 a=PalzARQSbocsUSjMRkwAPg==:117 a=PalzARQSbocsUSjMRkwAPg==:17
 a=kj9zAlcOel0A:10 a=5aJKZOoyagWDvRgEFpYA:9 a=CjuIK1q_8ugA:10
X-BeenThere: svn-src-head@freebsd.org
X-Mailman-Version: 2.1.27
Precedence: list
List-Id: SVN commit messages for the src tree for head/-current
 <svn-src-head.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/svn-src-head>,
 <mailto:svn-src-head-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/svn-src-head/>
List-Post: <mailto:svn-src-head@freebsd.org>
List-Help: <mailto:svn-src-head-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/svn-src-head>,
 <mailto:svn-src-head-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 06 Jul 2018 17:55:49 -0000

On Fri, 6 Jul 2018, John Baldwin wrote:

> On 7/6/18 8:52 AM, Rodney W. Grimes wrote:
>> ...
>> Trivial to fix this with
>> +#if defined(SMP) || !defined(_KERNEL) || defined(KLD_MODULE) || !defined(KLD_UP_MODULES)
>
> This is not worth it.  Note that we already use LOCK always in userland
> which is probably far more prevalent than the use in modules.
>
> Previously atomics in modules were _function calls_ just to avoid the LOCK.
> Having the LOCK prefix present even on UP is probably far more efficient
> than a function call.

No, the lock prefix is less efficient.

IIRC, on very old systems (~PPro), lock prefixes cost 20 cycles in the UP
case.  On AthlonXP, they cost about 19 cycles, but function calls (written
in C) only cost about 6 cycles.  This depends on pipelining, and my
test is perhaps too simple since it uses a loop where the pipelinig
works especially well (it executes 2 or 3 function calls in parallel).

Actually timing on AthlonXP UP:
- asm loop: 2 cycles/iteration
- "incl mem" in asm loop: 5.85 cycles (but with less alignment, only 3.25
   cycles)
- "lock; incl mem" in asm loop: 18.9 cycles
- function call in C loop to C function doing "incl mem" in asm: 8.35 cycles
- function call in C loop to C function doing "lock; incl mem" in asm: 24.95
   cycles.

Newer CPUs have better pipelining.  On Haswell, this gives the strange
behaviour that the function call written in C is slightly faster than
inline code written in asm:

Actual timing on Haswell SMP:
- asm loop: 1.16 cycles/iteration
- "incl mem" in asm loop: 6.95 cycles
- "lock; incl mem" in asm loop: 19.00 cycles
- function call in C loop to C function doing "incl mem" in asm: 6 cycles
- function call in C loop to C function doing "lock; incl mem" in asm: 26.00
   cycles.

The C code with the function call executes:

loop:
 	call	incl
 	incl:
 		pushl	%ebp
 		movl	%ebp,%esp
 		[lock;] incl mem
 		leave
 		ret
 	incl	%ebx
 	cmpl	$4080000000-1,%ebx
 	jbe	done

I didn't even compile with -fframe-pointer or try clang which would do
excessive unrolling.  -fframe-pointer takes 3 extra instructions in
incl, but these take no extra time.

In non-benchmark use, there would be more args for the function call so
and the scheduling would be very different so the timing might be very
different.  I expect the function call would be insignificantly slower
except in micro-benchmarks,

Bruce