Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 24 Jun 2001 13:25:19 +1000 (EST)
From:      Bruce Evans <bde@zeta.org.au>
To:        Matt Dillon <dillon@earth.backplane.com>
Cc:        Mikhail Teterin <mi@aldan.algebra.com>, jlemon@FreeBSD.org, cvs-committers@FreeBSD.org, cvs-all@FreeBSD.org
Subject:   Re: Inline optimized bzero (was Re: cvs commit: src/sys/netinet tcp_subr.c)
Message-ID:  <Pine.BSF.4.21.0106241223450.52918-100000@besplex.bde.org>
In-Reply-To: <200106232102.f5NL2fY73920@earth.backplane.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Sat, 23 Jun 2001, Matt Dillon wrote:

>     I would propose adding a new kernel bzero() function, called bzerol(),
>     which is an inline integer-aligned implementation.

I don't think this would be very useful, but if it exists then it
should be called bzero().  We've already made the mistake of having 2
functions for bcopy() (callers are supposed to use memcpy() for
non-overlapping copies with small constant sizes and bcopy() for all
other cases, but many callers aren't disciplined enough to do this).

> /*
>  * bzerol() - aligned bzero.  The buffer must be integer aligned and sized.
>  *
>  *	This routine should only be called with constant sizes, so GCC can
>  *	optimize it.  This routine typically optimizes down to just a few
>  *	instructions.
>  */
> 
> static __inline void
> bzerol(void *s, int bytes)
> {
>     assert((bytes & (sizeof(int) - 1)) == 0);
> 
>     switch(bytes) {
>     case sizeof(int) * 5:
> 	*((int *)s + 4) = 0;
> 	/* fall through */
>     case sizeof(int) * 4:
> 	*((int *)s + 3) = 0;
> 	/* fall through */
>     case sizeof(int) * 3:
> 	*((int *)s + 2) = 0;
> 	/* fall through */
>     case sizeof(int) * 2:
> 	*((int *)s + 1) = 0;
> 	/* fall through */
>     case sizeof(int) * 1:
> 	*(int *)s = 0;
> 	/* fall through */
>     case 0:
> 	return;
>     default:
> 	if (bytes >= sizeof(int) * 8) {
> 	    while (bytes >= sizeof(int) * 4) {
> 		*(int *)((char *)s + 0 * sizeof(int)) = 0;
> 		*(int *)((char *)s + 1 * sizeof(int)) = 0;
> 		*(int *)((char *)s + 2 * sizeof(int)) = 0;
> 		*(int *)((char *)s + 3 * sizeof(int)) = 0;
> 		s = (char *)s + sizeof(int) * 4;
> 		bytes -= sizeof(int) * 4;
> 	    }
> 	}
> 	while (bytes > 0) {
> 	    bytes -= 4;
> 	    *(int *)((char *)s + bytes) = 0;
> 	}
>     }
> }

I just found that gcc already has essentially this optimization, at
least on i386's, provided bzero() is spelled using memset() (I thought
that gcc only had the corresponding optimization for memcpy()).
"memset(p, 0, n)" generates stores of 0 for n <= 16 ("movl $0, addr"
if n is a multiple of 4).  For n >= 17 and for certain n < 16, it
generates not so optimal inline code using stos[bwl].  This is a
significant pessimization if n is very large and the library bzero
is significantly optimized (e.g., if the library bzero is i586_bzero).

To use the builtin memset except for parts of it that we don't like,
I suggest using code like:

#if defined(__GNUC) && defined(_HAVE_GOOD_BUILTIN_MEMSET)
#define	bzero(p, n) do {					\
	if (__builtin_constant_p(n) && (n) < LARGE_MD_VALUE &&	\
	   !__any_other_cases_that_we_dont_like(n))		\
		__builtin_memset((p), 0, (n));			\
	else							\
		(bzero)((p), (n));				\
} while (0)
#endif

Similarly for bzero/memcpy (the condition for not liking __builtin_memcpy
is currently `if (1)'.

Many bzero()s are now done in malloc(), so the above optimizations are
even less useful than they used to be :-).

Bruce


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe cvs-all" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.BSF.4.21.0106241223450.52918-100000>