Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 23 Jun 2001 13:34:58 -0700 (PDT)
From:      Matt Dillon <dillon@earth.backplane.com>
To:        Mikhail Teterin <mi@aldan.algebra.com>
Cc:        jlemon@FreeBSD.ORG, cvs-committers@FreeBSD.ORG, cvs-all@FreeBSD.ORG
Subject:   Re: cvs commit: src/sys/netinet tcp_subr.c
Message-ID:  <200106232034.f5NKYwY73776@earth.backplane.com>
References:   <200106231912.f5NJCUE01011@aldan.algebra.com>

next in thread | previous in thread | raw e-mail | index | archive | help
:On 23 Jun, Jonathan Lemon wrote:
:> jlemon      2001/06/23 10:44:28 PDT
:> 
:>   Modified files:
:>     sys/netinet          tcp_subr.c 
:>   Log:
:>   Replace bzero() of struct ip with explicit zeroing of structure members,
:>   which is faster.
:>   
:>   Revision  Changes    Path
:>   1.107     +7 -3      src/sys/netinet/tcp_subr.c
:
:Should people be asked to look in other parts of the kernel for places,
:where the bzero can be so replaced?
:
:	% find /sys/ -type f -name \*.c | xargs fgrep bzero | wc -l
:	1621
:
:
:	-mi

    I wouldn't get on a bzero-removal binge.  For TCP/IP it's a special
    case... the structures are pretty much set in stone.  But in many
    places a subroutine trying to initialize a structure does not necessarily
    know every last field in the structure -- and it would be bad programming
    practice to hardwire it because anyone adding new fields to the
    structure would not necessarily know about all the places where he
    has to initialize the field.

    This does bring up a good issue, though, and that is that all the junk
    we've thrown into bzero() over the years has seriously marginalized
    its performance for small counts.  If I write a poor-man's aligned bzero
    and test it I get:

Test1 - bcopy 20x2 bytes        216.59 nS/loop
Test2 - manual load data         26.19 nS/loop
Test3 - man load w/ptrs          35.68 nS/loop
Test4 - mlptrs & bzero          162.22 nS/loop
Test5 - mlptrszer & call        190.65 nS/loop   <----- libc bzero
----- - mlptrszer & call	130.81 ns/loop	 <----- kernel's i586_bzero()
Test6 - mlptrszerc/mybzero       71.72 nS/loop   <----- my small_bzerol()

/*
 * integer-aligned bzero for small buffers (no space bloat over making a
 * subroutine call).
 */
static __inline void
small_bzerol(void *s, int bytes)
{
    while (bytes > 0) {
	*(int *)((char *)s + bytes) = 0;
	bytes -= 4;
    }
}

    Now, of course, the kernel bzero is slightly different.  if I 
    take the i586_bzero code the above test (#5/#6) takes 130.81 nS,
    which is better then libc's 190 nS but still far worse then my
    poor-man's 71 nS.

    Furthermore, the assembly generated by the above inline is (taken
    from the memtest code cc -S'd):

        .p2align 2,0x90
.L51:
        movl $0,DBuf(%eax)
        addl $-4,%eax
        testl %eax,%eax
        jg .L51

    Which is just about the same size as a subroutine call (push, push, call,
    addl), so it doesn't bloat the kernel any.

						-Matt



To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe cvs-all" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200106232034.f5NKYwY73776>