From owner-svn-src-head@freebsd.org Sat Sep 9 09:40:58 2017 Return-Path: Delivered-To: svn-src-head@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 1A17EE0C19B; Sat, 9 Sep 2017 09:40:58 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail104.syd.optusnet.com.au (mail104.syd.optusnet.com.au [211.29.132.246]) by mx1.freebsd.org (Postfix) with ESMTP id BD6AB2658; Sat, 9 Sep 2017 09:40:56 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from [192.168.0.102] (c110-21-101-228.carlnfd1.nsw.optusnet.com.au [110.21.101.228]) by mail104.syd.optusnet.com.au (Postfix) with ESMTPS id F3CFE425D9F; Sat, 9 Sep 2017 19:40:49 +1000 (AEST) Date: Sat, 9 Sep 2017 19:40:48 +1000 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: David Chisnall cc: Mateusz Guzik , src-committers@freebsd.org, svn-src-all@freebsd.org, svn-src-head@freebsd.org Subject: Re: svn commit: r323329 - head/sys/sys In-Reply-To: <9BE7B1BD-DE1A-48FE-B247-A3CE7674648E@FreeBSD.org> Message-ID: <20170909180554.A17984@besplex.bde.org> References: <201709082009.v88K9EGW006964@repo.freebsd.org> <9BE7B1BD-DE1A-48FE-B247-A3CE7674648E@FreeBSD.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.2 cv=KeqiiUQD c=1 sm=1 tr=0 a=PalzARQSbocsUSjMRkwAPg==:117 a=PalzARQSbocsUSjMRkwAPg==:17 a=kj9zAlcOel0A:10 a=6I5d2MoRAAAA:8 a=USMd3rqED0NyYiGRO1kA:9 a=RbNcUileWBSlwGLN:21 a=B3oMvqzVwg8tgyhn:21 a=CjuIK1q_8ugA:10 a=IjZwj45LgO3ly-622nXo:22 X-BeenThere: svn-src-head@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: SVN commit messages for the src tree for head/-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 09 Sep 2017 09:40:58 -0000 On Sat, 9 Sep 2017, David Chisnall wrote: > On 8 Sep 2017, at 21:09, Mateusz Guzik wrote: >> >> Author: mjg >> Date: Fri Sep 8 20:09:14 2017 >> New Revision: 323329 >> URL: https://svnweb.freebsd.org/changeset/base/323329 >> >> Log: >> Allow __builtin_memset instead of bzero for small buffers of known size > > This change seems redundant, because modern compilers already do this optimisation. For example: > > #include > > char buf[42]; > > void bz(void) > { > bzero(buf, 42); > } > > With clang 4.0 on x86 compiles to: > > pushq %rbp > movq %rsp, %rbp > xorps %xmm0, %xmm0 > movups %xmm0, buf+26(%rip) > movaps %xmm0, buf+16(%rip) > movaps %xmm0, buf(%rip) > popq %rbp > retq This is only a valid optimization for POSIX code. Compiling with -std=c99 turns it off, even though the non-C99 is included. Note that -std is broken in CFLAGS for the kernel. It is c99 (actually iso9899:1999). Userland uses the correct standard gnu99. -std=c99 breaks other gnu features like unnamed struct/unions which are then unbroken bogusly using -fms-extensions. > Neither contains a call, both have inlined the zeroing. This change is strictly worse, because the compiler has some carefully tuned heuristics that are set per target for when to inline the memset / bzero and when to call the function. These are based on both the size and the alignment, including whether the target supports misaligned accesses and whether misaligned accesses are cheap. None of this is captured by this change. It is strictly better than plain bzero() in the kernel, and makes no difference in userland. > In the kernel, this optimisation is disabled by -ffreestanding, however __builtin_memset will be turned into a memset call if the size is not constant or if the memset call would be more efficient (as determined by the aforementioned heuristics). Simply using __builtin_memset in all cases should give better code, and is more likely to be forward compatible with future ISAs where the arbitrary constant picked in this patch may or may not be optimal. Compilers don't really understand this. In the kernel, extern memset() is strictly worse than extern bzero(), since memset() is just a wrapper around bzero(). Thus simply using __builtin_memset() gives worse strictly code in all cases that are not inlined. Not a large amount worse since the wrapper only does an extra branch and function call when it reduces to bzero(). memset() is just slow in other cases. The constant is not arbitrary, but is hard-coded to the x86 value which is more arbitrary. x86 compilers know that they don't really understand memory so they hard-code a conservative value of 64. It is only possible do better by compiling for a specific arch and knowing how good or bad the extern function is. 64 is about the smallest value above which the extern function could possibly do significantly better by knowing or doing more. Usually it does slightly worse near 64 and also higher since it knows less, especially if the compilation was for a specific arch (clang seems to have even more methods than my benchmarks for this -- hundreds instead of tens --, while x86 kernels have only one method optimized for 25-year old arches and now OK again on new arches with fast string instructions (above a few hundred bytes). Bruce