Date: Sun, 8 May 2011 20:14:25 +1000 (EST) From: Bruce Evans <brde@optusnet.com.au> To: Hans Petter Selasky <hselasky@c2i.net> Cc: "svn-src-head@freebsd.org" <svn-src-head@FreeBSD.org>, mdf@FreeBSD.org, "svn-src-all@freebsd.org" <svn-src-all@FreeBSD.org>, "src-committers@freebsd.org" <src-committers@FreeBSD.org> Subject: Re: svn commit: r221604 - head/usr.sbin/usbdump Message-ID: <20110508195020.V981@besplex.bde.org> In-Reply-To: <201105071955.35305.hselasky@c2i.net> References: <201105071628.p47GSO16006145@svn.freebsd.org> <201105071836.00660.hselasky@c2i.net> <BANLkTimi_Em60n9MZRTcgBDvycqH-pKL5g@mail.gmail.com> <201105071955.35305.hselasky@c2i.net>
next in thread | previous in thread | raw e-mail | index | archive | help
On Sat, 7 May 2011, Hans Petter Selasky wrote: > On Saturday 07 May 2011 19:13:27 mdf@freebsd.org wrote: >> On Sat, May 7, 2011 at 9:36 AM, Hans Petter Selasky <hselasky@c2i.net> > wrote: >>> On Saturday 07 May 2011 18:28:24 Hans Petter Selasky wrote: >>>> - Use memcpy() instead of bcopy(). >>> >>> - Use memset() instead of bzero(). >> >> Why? It usually falls through to the same code in libc. Is there >> some standardization on memfoo versus bfoo here? > > I thought that memset() was a compiler builtin and bzero() optimised for > larger amounts of data? In the kernel, compiler builtins aren't used, memset() is slightly pessimized, and bzero() is not optimized (except in old versions of FreeBSD on i386, attempts were made to optimize bzero() for large data at a tiny cost to small data). A better implementation would use the compiler builtin for both. My version does this, but the gains (or losses) from using builtins for this and other things in the kernel insignificant. Here it is for bzero(): #define bzero(p, n) ({ \ if (__builtin_constant_p(n) && (n) <= 32) \ __builtin_memset((p), 0, (n)); \ else \ (bzero)((p), (n)); \ }) This hard-codes the limit of 32 for the builtin since some versions of gcc use a worse limit. In userland, on at least amd64 and i386, the extern bzero() and memset() are unoptimized, but the compiler builtin is used for memset() only. A better implementation of bzero() would use the compiler builtin for it too. The above is not good enough for libc, since it evaluates args more than once and has a hard-coded gccism. The correct optimizations for bzero() etc. are very machine-dependent and context-dependent and are far too hard for anyone or the compiler or the CPU to get right (but I believe newer Intel CPUs are closer to making unoptimized stosb as fast as possible). Context-dependent parts include whether the data should go through cache(s) (it shouldn't iff it won't be used soon and the memory system is such that not going through caches is either faster or saves time later). Bruce
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20110508195020.V981>