Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 8 May 2011 20:14:25 +1000 (EST)
From:      Bruce Evans <brde@optusnet.com.au>
To:        Hans Petter Selasky <hselasky@c2i.net>
Cc:        "svn-src-head@freebsd.org" <svn-src-head@FreeBSD.org>, mdf@FreeBSD.org, "svn-src-all@freebsd.org" <svn-src-all@FreeBSD.org>, "src-committers@freebsd.org" <src-committers@FreeBSD.org>
Subject:   Re: svn commit: r221604 - head/usr.sbin/usbdump
Message-ID:  <20110508195020.V981@besplex.bde.org>
In-Reply-To: <201105071955.35305.hselasky@c2i.net>
References:  <201105071628.p47GSO16006145@svn.freebsd.org> <201105071836.00660.hselasky@c2i.net> <BANLkTimi_Em60n9MZRTcgBDvycqH-pKL5g@mail.gmail.com> <201105071955.35305.hselasky@c2i.net>

next in thread | previous in thread | raw e-mail | index | archive | help
On Sat, 7 May 2011, Hans Petter Selasky wrote:

> On Saturday 07 May 2011 19:13:27 mdf@freebsd.org wrote:
>> On Sat, May 7, 2011 at 9:36 AM, Hans Petter Selasky <hselasky@c2i.net>
> wrote:
>>> On Saturday 07 May 2011 18:28:24 Hans Petter Selasky wrote:
>>>>   - Use memcpy() instead of bcopy().
>>>
>>> - Use memset() instead of bzero().
>>
>> Why?  It usually falls through to the same code in libc.  Is there
>> some standardization on memfoo versus bfoo here?
>
> I thought that memset() was a compiler builtin and bzero() optimised for
> larger amounts of data?

In the kernel, compiler builtins aren't used, memset() is slightly
pessimized, and bzero() is not optimized (except in old versions of
FreeBSD on i386, attempts were made to optimize bzero() for large data
at a tiny cost to small data).  A better implementation would use the
compiler builtin for both.  My version does this, but the gains (or
losses) from using builtins for this and other things in the kernel
insignificant.  Here it is for bzero():

#define	bzero(p, n) ({						\
 	if (__builtin_constant_p(n) && (n) <= 32)		\
 		__builtin_memset((p), 0, (n));			\
 	else							\
 		(bzero)((p), (n));				\
})

This hard-codes the limit of 32 for the builtin since some versions of
gcc use a worse limit.

In userland, on at least amd64 and i386, the extern bzero() and memset()
are unoptimized, but the compiler builtin is used for memset() only.  A
better implementation of bzero() would use the compiler builtin for it
too.  The above is not good enough for libc, since it evaluates args more
than once and has a hard-coded gccism.

The correct optimizations for bzero() etc. are very machine-dependent
and context-dependent and are far too hard for anyone or the compiler
or the CPU to get right (but I believe newer Intel CPUs are closer to
making unoptimized stosb as fast as possible).  Context-dependent parts
include whether the data should go through cache(s) (it shouldn't iff
it won't be used soon and the memory system is such that not going
through caches is either faster or saves time later).

Bruce



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20110508195020.V981>