Date: Sat, 23 Jun 2001 22:13:18 -0700 (PDT) From: Matt Dillon <dillon@earth.backplane.com> To: Bruce Evans <bde@zeta.org.au> Cc: Mikhail Teterin <mi@aldan.algebra.com>, jlemon@FreeBSD.ORG, cvs-committers@FreeBSD.ORG, cvs-all@FreeBSD.ORG Subject: Re: Inline optimized bzero (was Re: cvs commit: src/sys/netinet tcp_subr.c) Message-ID: <200106240513.f5O5DIH75729@earth.backplane.com> References: <Pine.BSF.4.21.0106241223450.52918-100000@besplex.bde.org>
next in thread | previous in thread | raw e-mail | index | archive | help
:> I would propose adding a new kernel bzero() function, called bzerol(), :> which is an inline integer-aligned implementation. : :I don't think this would be very useful, but if it exists then it :should be called bzero(). We've already made the mistake of having 2 :functions for bcopy() (callers are supposed to use memcpy() for :non-overlapping copies with small constant sizes and bcopy() for all :other cases, but many callers aren't disciplined enough to do this). :... : :I just found that gcc already has essentially this optimization, at :least on i386's, provided bzero() is spelled using memset() (I thought :that gcc only had the corresponding optimization for memcpy()). :"memset(p, 0, n)" generates stores of 0 for n <= 16 ("movl $0, addr" :if n is a multiple of 4). For n >= 17 and for certain n < 16, it :generates not so optimal inline code using stos[bwl]. This is a :significant pessimization if n is very large and the library bzero :is significantly optimized (e.g., if the library bzero is i586_bzero). : :To use the builtin memset except for parts of it that we don't like, :I suggest using code like: : :#if defined(__GNUC) && defined(_HAVE_GOOD_BUILTIN_MEMSET) :#define bzero(p, n) do { \ : if (__builtin_constant_p(n) && (n) < LARGE_MD_VALUE && \ : !__any_other_cases_that_we_dont_like(n)) \ : __builtin_memset((p), 0, (n)); \ : else \ : (bzero)((p), (n)); \ :} while (0) :#endif : :Similarly for bzero/memcpy (the condition for not liking __builtin_memcpy :is currently `if (1)'. : :Many bzero()s are now done in malloc(), so the above optimizations are :even less useful than they used to be :-). : :Bruce Hmm. I wonder if __builtin_constant_p() works in an inline.... holy cow, it does! cc -S x.c -O /* * x.c */ volatile int x; static __inline void test(int n) { if (__builtin_constant_p(n)) { switch(n) { case 1: x = 2; break; case 2: x = 20; break; case 3: x = 200; break; default: x = 2000; break; } } else { xtest(n); } } main(int ac, char **av) { test(1); test(2); test(3); test(x); } results in: movl $2,x movl $20,x movl $200,x movl x,%eax addl $-12,%esp pushl %eax call xtest Ok, so what if we made bzero() an inline which checked to see if the size was a constant (or too large) and did the memset() magic there, otherwise it calls the real bzero for non-constant or too-large n's (say, anything bigger then 64 bytes). I think we'd have to use an inline rather then a #define'd macro to make it look as much like a real function as possible. As a separate issue, I am starting to get real worried about the FP optimization in bzero() interfering with -current... I'm thinking that we should rip it out. -Matt To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe cvs-all" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200106240513.f5O5DIH75729>