Date: Sat, 23 Jun 2001 13:34:58 -0700 (PDT) From: Matt Dillon <dillon@earth.backplane.com> To: Mikhail Teterin <mi@aldan.algebra.com> Cc: jlemon@FreeBSD.ORG, cvs-committers@FreeBSD.ORG, cvs-all@FreeBSD.ORG Subject: Re: cvs commit: src/sys/netinet tcp_subr.c Message-ID: <200106232034.f5NKYwY73776@earth.backplane.com> References: <200106231912.f5NJCUE01011@aldan.algebra.com>
next in thread | previous in thread | raw e-mail | index | archive | help
:On 23 Jun, Jonathan Lemon wrote: :> jlemon 2001/06/23 10:44:28 PDT :> :> Modified files: :> sys/netinet tcp_subr.c :> Log: :> Replace bzero() of struct ip with explicit zeroing of structure members, :> which is faster. :> :> Revision Changes Path :> 1.107 +7 -3 src/sys/netinet/tcp_subr.c : :Should people be asked to look in other parts of the kernel for places, :where the bzero can be so replaced? : : % find /sys/ -type f -name \*.c | xargs fgrep bzero | wc -l : 1621 : : : -mi I wouldn't get on a bzero-removal binge. For TCP/IP it's a special case... the structures are pretty much set in stone. But in many places a subroutine trying to initialize a structure does not necessarily know every last field in the structure -- and it would be bad programming practice to hardwire it because anyone adding new fields to the structure would not necessarily know about all the places where he has to initialize the field. This does bring up a good issue, though, and that is that all the junk we've thrown into bzero() over the years has seriously marginalized its performance for small counts. If I write a poor-man's aligned bzero and test it I get: Test1 - bcopy 20x2 bytes 216.59 nS/loop Test2 - manual load data 26.19 nS/loop Test3 - man load w/ptrs 35.68 nS/loop Test4 - mlptrs & bzero 162.22 nS/loop Test5 - mlptrszer & call 190.65 nS/loop <----- libc bzero ----- - mlptrszer & call 130.81 ns/loop <----- kernel's i586_bzero() Test6 - mlptrszerc/mybzero 71.72 nS/loop <----- my small_bzerol() /* * integer-aligned bzero for small buffers (no space bloat over making a * subroutine call). */ static __inline void small_bzerol(void *s, int bytes) { while (bytes > 0) { *(int *)((char *)s + bytes) = 0; bytes -= 4; } } Now, of course, the kernel bzero is slightly different. if I take the i586_bzero code the above test (#5/#6) takes 130.81 nS, which is better then libc's 190 nS but still far worse then my poor-man's 71 nS. Furthermore, the assembly generated by the above inline is (taken from the memtest code cc -S'd): .p2align 2,0x90 .L51: movl $0,DBuf(%eax) addl $-4,%eax testl %eax,%eax jg .L51 Which is just about the same size as a subroutine call (push, push, call, addl), so it doesn't bloat the kernel any. -Matt To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe cvs-all" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200106232034.f5NKYwY73776>