From owner-cvs-all Sun Jun 24 23: 7:50 2001 Delivered-To: cvs-all@freebsd.org Received: from peter3.wemm.org (c1315225-a.plstn1.sfba.home.com [65.0.135.147]) by hub.freebsd.org (Postfix) with ESMTP id 669EB37B405; Sun, 24 Jun 2001 23:07:43 -0700 (PDT) (envelope-from peter@wemm.org) Received: from overcee.netplex.com.au (overcee.wemm.org [10.0.0.3]) by peter3.wemm.org (8.11.0/8.11.0) with ESMTP id f5P67hM29932; Sun, 24 Jun 2001 23:07:43 -0700 (PDT) (envelope-from peter@wemm.org) Received: from wemm.org (localhost [127.0.0.1]) by overcee.netplex.com.au (Postfix) with ESMTP id 292F7380B; Sun, 24 Jun 2001 23:07:43 -0700 (PDT) (envelope-from peter@wemm.org) X-Mailer: exmh version 2.3.1 01/18/2001 with nmh-1.0.4 To: Matt Dillon Cc: Bruce Evans , Mikhail Teterin , jlemon@FreeBSD.org, cvs-committers@FreeBSD.org, cvs-all@FreeBSD.org Subject: Re: kernel size w/ optimized bzero() & patch set (was Re: Inline optimized bzero (was Re: cvs commit: src/sys/netinettcp_subr.c)) In-Reply-To: <200106241702.f5OH2oN78720@earth.backplane.com> Date: Sun, 24 Jun 2001 23:07:43 -0700 From: Peter Wemm Message-Id: <20010625060743.292F7380B@overcee.netplex.com.au> Sender: owner-cvs-all@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG Matt Dillon wrote: > Ok, how about this. I replaced bzero() with the inline and placed it > in the machine-dependant section of code. I managed to knock the inline > code generation down to the point where it does not bloat the resulting > kernel binary. As an example of this, the 'register int z = 0' caused > all the assignments to 0 to use 'movl %eax,...' (3 byte instruction) > instead of 'movl $0,...' (7 byte instruction). The kernel size is > around 6000 bytes larger without that optimization. Sometimes GCC's > optimizer gets in the way :-( > > I am amazed by the results... and I found a couple of interesting things > out too. For example, tcp_input bzero's a number of 8 and 12 byte > structures, not just the 20 byte structures we were looking at previously Just think.. This new ``improved'' bzero code can now fill up all 4K of L1 instruction cache on most of my systems, and most of my 8K L1 instruction cache on >= coppermine cpus. I'm impressed. Those microbenchmarks had better be damn good, because it may end up the only thing that the system will do well now since all this excessive inlining looks like it is blowing the L1 cache out the door. (I also apply the same complaint to the vm/* inlines). Cheers, -Peter -- Peter Wemm - peter@FreeBSD.org; peter@yahoo-inc.com; peter@netplex.com.au "All of this is for nothing if we don't go to the stars" - JMS/B5 To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe cvs-all" in the body of the message