From owner-svn-src-head@freebsd.org Fri Jul 6 17:55:48 2018 Return-Path: Delivered-To: svn-src-head@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id D8FC31040576; Fri, 6 Jul 2018 17:55:48 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail106.syd.optusnet.com.au (mail106.syd.optusnet.com.au [211.29.132.42]) by mx1.freebsd.org (Postfix) with ESMTP id 53DB68E785; Fri, 6 Jul 2018 17:55:47 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from [192.168.0.102] (c110-21-101-228.carlnfd1.nsw.optusnet.com.au [110.21.101.228]) by mail106.syd.optusnet.com.au (Postfix) with ESMTPS id 997443CA3B1; Sat, 7 Jul 2018 03:55:36 +1000 (AEST) Date: Sat, 7 Jul 2018 03:55:35 +1000 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: John Baldwin cc: rgrimes@freebsd.org, Warner Losh , Hans Petter Selasky , src-committers , svn-src-all@freebsd.org, svn-src-head@freebsd.org Subject: Re: svn commit: r336025 - in head/sys: amd64/include i386/include In-Reply-To: <1f87b7ba-3b59-e710-00b0-91a4b0e4e5b4@FreeBSD.org> Message-ID: <20180707031245.J2611@besplex.bde.org> References: <201807061552.w66Fq0FX052931@pdx.rh.CN85.dnsmgr.net> <1f87b7ba-3b59-e710-00b0-91a4b0e4e5b4@FreeBSD.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.2 cv=I9sVfJog c=1 sm=1 tr=0 a=PalzARQSbocsUSjMRkwAPg==:117 a=PalzARQSbocsUSjMRkwAPg==:17 a=kj9zAlcOel0A:10 a=5aJKZOoyagWDvRgEFpYA:9 a=CjuIK1q_8ugA:10 X-BeenThere: svn-src-head@freebsd.org X-Mailman-Version: 2.1.27 Precedence: list List-Id: SVN commit messages for the src tree for head/-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 06 Jul 2018 17:55:49 -0000 On Fri, 6 Jul 2018, John Baldwin wrote: > On 7/6/18 8:52 AM, Rodney W. Grimes wrote: >> ... >> Trivial to fix this with >> +#if defined(SMP) || !defined(_KERNEL) || defined(KLD_MODULE) || !defined(KLD_UP_MODULES) > > This is not worth it. Note that we already use LOCK always in userland > which is probably far more prevalent than the use in modules. > > Previously atomics in modules were _function calls_ just to avoid the LOCK. > Having the LOCK prefix present even on UP is probably far more efficient > than a function call. No, the lock prefix is less efficient. IIRC, on very old systems (~PPro), lock prefixes cost 20 cycles in the UP case. On AthlonXP, they cost about 19 cycles, but function calls (written in C) only cost about 6 cycles. This depends on pipelining, and my test is perhaps too simple since it uses a loop where the pipelinig works especially well (it executes 2 or 3 function calls in parallel). Actually timing on AthlonXP UP: - asm loop: 2 cycles/iteration - "incl mem" in asm loop: 5.85 cycles (but with less alignment, only 3.25 cycles) - "lock; incl mem" in asm loop: 18.9 cycles - function call in C loop to C function doing "incl mem" in asm: 8.35 cycles - function call in C loop to C function doing "lock; incl mem" in asm: 24.95 cycles. Newer CPUs have better pipelining. On Haswell, this gives the strange behaviour that the function call written in C is slightly faster than inline code written in asm: Actual timing on Haswell SMP: - asm loop: 1.16 cycles/iteration - "incl mem" in asm loop: 6.95 cycles - "lock; incl mem" in asm loop: 19.00 cycles - function call in C loop to C function doing "incl mem" in asm: 6 cycles - function call in C loop to C function doing "lock; incl mem" in asm: 26.00 cycles. The C code with the function call executes: loop: call incl incl: pushl %ebp movl %ebp,%esp [lock;] incl mem leave ret incl %ebx cmpl $4080000000-1,%ebx jbe done I didn't even compile with -fframe-pointer or try clang which would do excessive unrolling. -fframe-pointer takes 3 extra instructions in incl, but these take no extra time. In non-benchmark use, there would be more args for the function call so and the scheduling would be very different so the timing might be very different. I expect the function call would be insignificantly slower except in micro-benchmarks, Bruce