Date: Sat, 23 Jun 2001 13:34:58 -0700 (PDT) From: Matt Dillon <dillon@earth.backplane.com> To: Mikhail Teterin <mi@aldan.algebra.com> Cc: jlemon@FreeBSD.ORG, cvs-committers@FreeBSD.ORG, cvs-all@FreeBSD.ORG Subject: Re: cvs commit: src/sys/netinet tcp_subr.c Message-ID: <200106232034.f5NKYwY73776@earth.backplane.com> References: <200106231912.f5NJCUE01011@aldan.algebra.com>
next in thread | previous in thread | raw e-mail | index | archive | help
:On 23 Jun, Jonathan Lemon wrote:
:> jlemon 2001/06/23 10:44:28 PDT
:>
:> Modified files:
:> sys/netinet tcp_subr.c
:> Log:
:> Replace bzero() of struct ip with explicit zeroing of structure members,
:> which is faster.
:>
:> Revision Changes Path
:> 1.107 +7 -3 src/sys/netinet/tcp_subr.c
:
:Should people be asked to look in other parts of the kernel for places,
:where the bzero can be so replaced?
:
: % find /sys/ -type f -name \*.c | xargs fgrep bzero | wc -l
: 1621
:
:
: -mi
I wouldn't get on a bzero-removal binge. For TCP/IP it's a special
case... the structures are pretty much set in stone. But in many
places a subroutine trying to initialize a structure does not necessarily
know every last field in the structure -- and it would be bad programming
practice to hardwire it because anyone adding new fields to the
structure would not necessarily know about all the places where he
has to initialize the field.
This does bring up a good issue, though, and that is that all the junk
we've thrown into bzero() over the years has seriously marginalized
its performance for small counts. If I write a poor-man's aligned bzero
and test it I get:
Test1 - bcopy 20x2 bytes 216.59 nS/loop
Test2 - manual load data 26.19 nS/loop
Test3 - man load w/ptrs 35.68 nS/loop
Test4 - mlptrs & bzero 162.22 nS/loop
Test5 - mlptrszer & call 190.65 nS/loop <----- libc bzero
----- - mlptrszer & call 130.81 ns/loop <----- kernel's i586_bzero()
Test6 - mlptrszerc/mybzero 71.72 nS/loop <----- my small_bzerol()
/*
* integer-aligned bzero for small buffers (no space bloat over making a
* subroutine call).
*/
static __inline void
small_bzerol(void *s, int bytes)
{
while (bytes > 0) {
*(int *)((char *)s + bytes) = 0;
bytes -= 4;
}
}
Now, of course, the kernel bzero is slightly different. if I
take the i586_bzero code the above test (#5/#6) takes 130.81 nS,
which is better then libc's 190 nS but still far worse then my
poor-man's 71 nS.
Furthermore, the assembly generated by the above inline is (taken
from the memtest code cc -S'd):
.p2align 2,0x90
.L51:
movl $0,DBuf(%eax)
addl $-4,%eax
testl %eax,%eax
jg .L51
Which is just about the same size as a subroutine call (push, push, call,
addl), so it doesn't bloat the kernel any.
-Matt
To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe cvs-all" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200106232034.f5NKYwY73776>
