Date: Sun, 24 Jun 2001 08:49:06 -0700 (PDT) From: Matt Dillon <dillon@earth.backplane.com> To: Bruce Evans <bde@zeta.org.au> Cc: Mikhail Teterin <mi@aldan.algebra.com>, jlemon@FreeBSD.ORG, cvs-committers@FreeBSD.ORG, cvs-all@FreeBSD.ORG Subject: Re: Inline optimized bzero (was Re: cvs commit: src/sys/netinet tcp_subr.c) Message-ID: <200106241549.f5OFn6J78347@earth.backplane.com> References: <Pine.BSF.4.21.0106241725360.54646-100000@besplex.bde.org>
next in thread | previous in thread | raw e-mail | index | archive | help
:I benchmarked the following version using lmbench2: : :#define bzero(p, n) ({ \ : if (__builtin_constant_p(n) && (n) <= 16) \ : __builtin_memset((p), 0, (n)); \ : else \ : (bzero)((p), (n)); \ :}) : :The results were uninteresting: essentially no change. lmbench2 is a :micro-benchmark, so it tends to show larger improvements for micro- :optimizations than can be expected in normal use. I wouldn't expect lmbench to be useful here. :> As a separate issue, I am starting to get real worried about the FP :> optimization in bzero() interfering with -current... I'm thinking that :> we should rip it out. : :It only costs one indirection for a call through a function pointer, :and is disabled anyway (but we still pay for the indirection). The :cost is mainly in source code complexity. : :One point that I noticed after writing my original reply: the gcc :builtins depend on misaligned accesses not trapping. This is reasonable :on i386's, although it is broken if alignment checking is enabled :(but other things are broken, e.g., copying of structs essentially :uses the builtin memcpy and does misaligned copies for some structs I added an alignment check to my bzerol() inline and it blew it up... it added 6ns to the loop, which is fine, but it blew up the constant optimization and wound up adding a switch table and a dozen instructions inline (hundreds of bytes!). I added alignment checks to i586_bzero but it ate 20nS. Also, it should be noted that i586_bzero() as it currently stands does not do any alignment checks either - it checks only the size argument, it doesn't check the base pointer. I suppose the bzero() inline could implemented in the machine-dependant code section. It seems a shame to waste mostly portable code though. I'll mess around with it a bit. -Matt :(e.g., ones containing just a large array of shorts, inside another :struct so that the i386 ABI forces perfect misalignment of the array). :However, unaligned accesses trap on some machines (including alphas :I think), so the corresponding optimization for memset is not possible. :Your bzerol() could do better by knowing that the pointer is aligned. :However, I think the source code shouldn't be complicated with :optimizations like this. For alphas, there are the additional :complications that 64-bit copies should be preferred, but I think :more alignment is required for 64-bit copies, so the alignment would :have to be part of the interface for maximal efficiency... : :Bruce To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe cvs-all" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200106241549.f5OFn6J78347>