Date: Sat, 23 Jun 2001 22:13:18 -0700 (PDT) From: Matt Dillon <dillon@earth.backplane.com> To: Bruce Evans <bde@zeta.org.au> Cc: Mikhail Teterin <mi@aldan.algebra.com>, jlemon@FreeBSD.ORG, cvs-committers@FreeBSD.ORG, cvs-all@FreeBSD.ORG Subject: Re: Inline optimized bzero (was Re: cvs commit: src/sys/netinet tcp_subr.c) Message-ID: <200106240513.f5O5DIH75729@earth.backplane.com> References: <Pine.BSF.4.21.0106241223450.52918-100000@besplex.bde.org>
next in thread | previous in thread | raw e-mail | index | archive | help
:> I would propose adding a new kernel bzero() function, called bzerol(),
:> which is an inline integer-aligned implementation.
:
:I don't think this would be very useful, but if it exists then it
:should be called bzero(). We've already made the mistake of having 2
:functions for bcopy() (callers are supposed to use memcpy() for
:non-overlapping copies with small constant sizes and bcopy() for all
:other cases, but many callers aren't disciplined enough to do this).
:...
:
:I just found that gcc already has essentially this optimization, at
:least on i386's, provided bzero() is spelled using memset() (I thought
:that gcc only had the corresponding optimization for memcpy()).
:"memset(p, 0, n)" generates stores of 0 for n <= 16 ("movl $0, addr"
:if n is a multiple of 4). For n >= 17 and for certain n < 16, it
:generates not so optimal inline code using stos[bwl]. This is a
:significant pessimization if n is very large and the library bzero
:is significantly optimized (e.g., if the library bzero is i586_bzero).
:
:To use the builtin memset except for parts of it that we don't like,
:I suggest using code like:
:
:#if defined(__GNUC) && defined(_HAVE_GOOD_BUILTIN_MEMSET)
:#define bzero(p, n) do { \
: if (__builtin_constant_p(n) && (n) < LARGE_MD_VALUE && \
: !__any_other_cases_that_we_dont_like(n)) \
: __builtin_memset((p), 0, (n)); \
: else \
: (bzero)((p), (n)); \
:} while (0)
:#endif
:
:Similarly for bzero/memcpy (the condition for not liking __builtin_memcpy
:is currently `if (1)'.
:
:Many bzero()s are now done in malloc(), so the above optimizations are
:even less useful than they used to be :-).
:
:Bruce
Hmm. I wonder if __builtin_constant_p() works in an inline....
holy cow, it does!
cc -S x.c -O
/*
* x.c
*/
volatile int x;
static __inline
void
test(int n)
{
if (__builtin_constant_p(n)) {
switch(n) {
case 1:
x = 2;
break;
case 2:
x = 20;
break;
case 3:
x = 200;
break;
default:
x = 2000;
break;
}
} else {
xtest(n);
}
}
main(int ac, char **av)
{
test(1);
test(2);
test(3);
test(x);
}
results in:
movl $2,x
movl $20,x
movl $200,x
movl x,%eax
addl $-12,%esp
pushl %eax
call xtest
Ok, so what if we made bzero() an inline which checked to see if the
size was a constant (or too large) and did the memset() magic there,
otherwise it calls the real bzero for non-constant or too-large n's
(say, anything bigger then 64 bytes). I think we'd have to use an
inline rather then a #define'd macro to make it look as much like a
real function as possible.
As a separate issue, I am starting to get real worried about the FP
optimization in bzero() interfering with -current... I'm thinking that
we should rip it out.
-Matt
To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe cvs-all" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200106240513.f5O5DIH75729>
