Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 23 Jun 2001 22:13:18 -0700 (PDT)
From:      Matt Dillon <dillon@earth.backplane.com>
To:        Bruce Evans <bde@zeta.org.au>
Cc:        Mikhail Teterin <mi@aldan.algebra.com>, jlemon@FreeBSD.ORG, cvs-committers@FreeBSD.ORG, cvs-all@FreeBSD.ORG
Subject:   Re: Inline optimized bzero (was Re: cvs commit: src/sys/netinet tcp_subr.c)
Message-ID:  <200106240513.f5O5DIH75729@earth.backplane.com>
References:   <Pine.BSF.4.21.0106241223450.52918-100000@besplex.bde.org>

next in thread | previous in thread | raw e-mail | index | archive | help

:>     I would propose adding a new kernel bzero() function, called bzerol(),
:>     which is an inline integer-aligned implementation.
:
:I don't think this would be very useful, but if it exists then it
:should be called bzero().  We've already made the mistake of having 2
:functions for bcopy() (callers are supposed to use memcpy() for
:non-overlapping copies with small constant sizes and bcopy() for all
:other cases, but many callers aren't disciplined enough to do this).
:...
:
:I just found that gcc already has essentially this optimization, at
:least on i386's, provided bzero() is spelled using memset() (I thought
:that gcc only had the corresponding optimization for memcpy()).
:"memset(p, 0, n)" generates stores of 0 for n <= 16 ("movl $0, addr"
:if n is a multiple of 4).  For n >= 17 and for certain n < 16, it
:generates not so optimal inline code using stos[bwl].  This is a
:significant pessimization if n is very large and the library bzero
:is significantly optimized (e.g., if the library bzero is i586_bzero).
:
:To use the builtin memset except for parts of it that we don't like,
:I suggest using code like:
:
:#if defined(__GNUC) && defined(_HAVE_GOOD_BUILTIN_MEMSET)
:#define	bzero(p, n) do {					\
:	if (__builtin_constant_p(n) && (n) < LARGE_MD_VALUE &&	\
:	   !__any_other_cases_that_we_dont_like(n))		\
:		__builtin_memset((p), 0, (n));			\
:	else							\
:		(bzero)((p), (n));				\
:} while (0)
:#endif
:
:Similarly for bzero/memcpy (the condition for not liking __builtin_memcpy
:is currently `if (1)'.
:
:Many bzero()s are now done in malloc(), so the above optimizations are
:even less useful than they used to be :-).
:
:Bruce

    Hmm.  I wonder if __builtin_constant_p() works in an inline....
    holy cow, it does!

    cc -S x.c -O 

/*
 * x.c
 */

volatile int x;

static __inline
void
test(int n)
{
    if (__builtin_constant_p(n)) {
	switch(n) {
	case 1:
	    x = 2;
	    break;
	case 2:
	    x = 20;
	    break;
	case 3:
	    x = 200;
	    break;
	default:
	    x = 2000;
	    break;
	}
    } else {
	xtest(n);
    }
}

main(int ac, char **av)
{
    test(1);
    test(2);
    test(3);
    test(x);
}


results in:

        movl $2,x
        movl $20,x
        movl $200,x
        movl x,%eax
        addl $-12,%esp
        pushl %eax
        call xtest

    Ok, so what if we made bzero() an inline which checked to see if the
    size was a constant (or too large) and did the memset() magic there,
    otherwise it calls the real bzero for non-constant or too-large n's
    (say, anything bigger then 64 bytes).  I think we'd have to use an
    inline rather then a #define'd macro to make it look as much like a
    real function as possible.

    As a separate issue, I am starting to get real worried about the FP
    optimization in bzero() interfering with -current... I'm thinking that
    we should rip it out.

						-Matt



To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe cvs-all" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200106240513.f5O5DIH75729>