From owner-freebsd-current Thu Nov 28 21:57:28 1996 Return-Path: owner-current Received: (from root@localhost) by freefall.freebsd.org (8.7.5/8.7.3) id VAA29653 for current-outgoing; Thu, 28 Nov 1996 21:57:28 -0800 (PST) Received: from godzilla.zeta.org.au (godzilla.zeta.org.au [203.2.228.19]) by freefall.freebsd.org (8.7.5/8.7.3) with ESMTP id VAA29648 for ; Thu, 28 Nov 1996 21:57:24 -0800 (PST) Received: (from bde@localhost) by godzilla.zeta.org.au (8.8.3/8.6.9) id QAA15253; Fri, 29 Nov 1996 16:53:01 +1100 Date: Fri, 29 Nov 1996 16:53:01 +1100 From: Bruce Evans Message-Id: <199611290553.QAA15253@godzilla.zeta.org.au> To: bde@zeta.org.au, toor@dyson.iquest.net Subject: Re: users of "ft" tapes, please test! Cc: current@freebsd.org, phk@critter.tfs.com Sender: owner-current@freebsd.org X-Loop: FreeBSD.org Precedence: bulk >I built the system with a de-inlined splx() and found an approx 10K savings. This saves 20K out of 1096K here (I have a lot of rarely used drivers and file systems in my kernel for testing). >make the changes. It would be very suprising to see that an appropriately >coded splvm/splimp/splxxx would be much smaller than the subroutine call... Yes, it would be surprising :-). A function call with no args takes 5 bytes. Just referencing 2 different memory locations (cpl and xxx_imask) takes a miniumum of 10 bytes unless pointers to the locations are kept in registers. A function call with args takes many more bytes but the inline code to handle the args is likely to take even more. >Have you considered coding the splxxx inlines in tight asm? Would that help? Yes. No. For the simplest case (splhigh()), inline asm can't possibly be tighter than: movl $_cpl,%eax # 5 bytes movl (%eax),%another_reg # 2 bytes movl $0xffffffff,(%eax) # 6 bytes Writing it in C allows generation of code like: movl (%reg1),%reg2 # 2 bytes ($_cpl already in %reg1) movl %reg3,(%reg1) # 2 bytes ($0xffffffff already in %reg3) gcc doesn't actually generate code like this. There usually aren't enough registers, but gcc doesn't even generate it for: for (i = 0; i < 1000; ++i) { s = splhigh(); foo(); splx(s); } gcc apparently thinks that loading address constants into registers is a waste of time on x86's. It's right in x86's with no cache :-). Bruce