Date: Mon, 25 Jun 2001 04:05:03 -0700 (PDT) From: Matt Dillon <dillon@earth.backplane.com> To: Bruce Evans <bde@zeta.org.au> Cc: Peter Wemm <peter@wemm.org>, Mikhail Teterin <mi@aldan.algebra.com>, jlemon@FreeBSD.ORG, cvs-committers@FreeBSD.ORG, cvs-all@FreeBSD.ORG Subject: Re: kernel size w/ optimized bzero() & patch set (was Re: Inline optimized bzero (was Re: cvs commit: src/sys/netinettcp_subr.c)) Message-ID: <200106251105.f5PB53004512@earth.backplane.com> References: <Pine.BSF.4.21.0106252337370.7918-100000@besplex.bde.org>
next in thread | previous in thread | raw e-mail | index | archive | help
:On Sun, 24 Jun 2001, Matt Dillon wrote:
:
:[Peter Wemm wrote]
:> :Just think.. This new ``improved'' bzero code can now fill up all 4K of L1
:> :instruction cache on most of my systems, and most of my 8K L1 instruction
:> :cache on >= coppermine cpus. I'm impressed. Those microbenchmarks had
:>
:> Huh? Peter, you obviously haven't been listening. I strongly recommend
:> that you review the last few postings I've made. The suggested bzero
:> code certainly does NOT in any way blow up the L1 cache, and I think
:> I'm pretty clear on that. I wouldn't be doing it if it did.
:
:It was an intermediate version that blew up the cache. I have been trying
:slightly different versions, and found that gcc's builtin version doesn't
:make all that much difference in the code size, either up or down. With
:the following version of bzero:
:
:#define bzero(p, n) ({ \
: if (__builtin_constant_p(n) && (n) <= X) \
: __builtin_memset((p), 0, (n)); \
: else \
: (bzero)((p), (n)); \
:})
:
:for X = 0, 4, 8, 12, 16, 32 and "infinity", the kernel sizes were:
:
: text data bss dec hex filename
:1962434 151436 349824 2463694 2597ce kernel.4
:1962442 151436 349824 2463702 2597d6 kernel.8
:1962446 151436 349824 2463706 2597da kernel.12
:1962466 151436 349824 2463726 2597ee kernel.0
:1962802 151436 349824 2464062 25993e kernel.16
:1962866 151436 349824 2464126 25997e kernel.20
:1963538 151436 349824 2464798 259c1e kernel.32
:1964098 151436 349824 2465358 259e4e kernel.infinity
:
:Summary: it's hard for the inline version to be smaller; even when it
:only needs to do one store-immediate operation, the kernel is only 32
:bytes smaller than the one using function calls which have to push
:2 args, do the call, and clean up. This is presumably due to increased
:register pressure for the inlined versions.
Very interesting! Yes, I would tend to agree... though with bzero
the register load should be minimal since all it is doing is storing
zero through a pointer. When I wrote DICE for the Amiga I had very
similar problems implementing structural copies, which required an
index and two pointers, but I did not have a problem with indirection
through non-registerized pointers (which required just one address
register), or array indexes (the 68000 didn't have scaled indexes,
though the 68020 and later did).
:OTOH, the recent uninlining of the mbuf macros somehow reduced the
:size of my standard kernel by more than 5% (more than 100K). It also
:reduced the compilation time by more than 10%. Kernel compilation
:times are still 65% larger than in RELENG_3 for kernels with essentially
:the same options (this is using -current's compiler; they are 85%
:larger using RELENG_3's compiler).
You know, I'm not surprised. The mbuf macros were a really excellent
example of things that should not be macroized. Another good example
of macros that should never have been written are the sys/nfs/nfsm_subs.h
in the NFS subsystem (wasn't someone working on cleaning those up?
Alfred?).
:> :better be damn good, because it may end up the only thing that the system
:> :will do well now since all this excessive inlining looks like it is blowing
:> :the L1 cache out the door.
:> :
:> :(I also apply the same complaint to the vm/* inlines).
:>
:> And you are just as wrong. The few functions inlined in vm/* are inlined
:> mainly because (A) they are called with constant arguments, which means
:
:Some seem to have rotted a bit. E.g., _vm_map_lock_upgrade() (adding
:an mtx_lock() to anything will bloat it in both space and time).
:
:Bruce
Oh god, what have they done to my VM inlines! What a holy mess!
I disclaim all responsibility... blame whoever comitted that mess.
I would never do anything like that! Those macros used to be just
lockmgr() calls.
-Matt
To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe cvs-all" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200106251105.f5PB53004512>
