Date: Thu, 19 Aug 1999 22:28:50 +0800 From: Peter Wemm <peter@netplex.com.au> To: Bruce Evans <bde@zeta.org.au> Cc: cvs-all@FreeBSD.org, cvs-committers@FreeBSD.org Subject: Re: cvs commit: src/sys/i386/include cpufunc.h Message-ID: <19990819142850.C78D81C9F@overcee.netplex.com.au> In-Reply-To: Your message of "Thu, 19 Aug 1999 17:27:33 %2B1000." <199908190727.RAA14796@godzilla.zeta.org.au>
next in thread | previous in thread | raw e-mail | index | archive | help
Bruce Evans wrote: > > Modified files: > > sys/i386/include cpufunc.h > > Log: > > Try using the builtin ffs() for egcs, it (by random inspection) > > generates slightly better code and avoids the incl then subl when > > using ffs(foo) - 1. > > The inline asm version of ffs(x) should be implemented as > (x == 0 ? 0 : bsfl(x) + 1). The compiler can then perform all possible > optimisations except ones that use the condition codes delivered by bsfl > (these never seem to help). This gives slightly better code than the > builtin. except the one where I want "bsfl(x)", not "bsfl(x) + 1", With the cpufunc.h inline, it works out as: testl %eax,%eax; je 1f; bsfl %eax; addl $1,%eax; 1: subl $1,%eax ie: it can't optimize out the +1 -1. How about this instead: static __inline int __bsfl(int mask) { int result; __asm __volatile("bsfl %0,%0" : "=r" (result) : "0" (mask)); return result; } static __inline int ffs(int mask) { return mask == 0 ? mask : __bsfl(mask) + 1; } Then, with the following code: extern int bar(int); int foo(int j) { int i; if (j) bar (ffs(j) - 1); } It gets optimized much better: foo: movl 4(%esp),%eax testl %eax,%eax je .L6 #APP bsfl %eax,%eax #NO_APP pushl %eax call bar addl $4,%esp .L6: ret Versus the original inline with your ffs macro: foo: movl 4(%esp),%eax testl %eax,%eax je .L37 #APP testl %eax,%eax je 1f bsfl %eax,%eax incl %eax 1: #NO_APP decl %eax pushl %eax call bar addl $4,%esp .L37: ret The redundant incl, decl isn't optimized and contains a duplicate (never taken) test. Using this same code with builtin_ffs() results in: foo: movl 4(%esp),%eax testl %eax,%eax je .L37 bsfl %eax,%eax pushl %eax call bar addl $4,%esp .L37: ret However, you're right. builtin_ffs() sucks when the argument is not known ie: leave out the if (j), and it turns into: foo: movl 4(%esp),%eax bsfl %eax,%edx jne .L37 movl $-1,%edx .L37: pushl %edx call bar addl $4,%esp ret Which means bsfl is always called even for a zero arg. Leaving out "if (j)" on my above code with my __bsfl() version results in: foo: movl 4(%esp),%eax testl %eax,%eax je .L6 #APP bsfl %eax,%eax #NO_APP incl %eax .L6: decl %eax pushl %eax call bar addl $4,%esp ret .. which is equivalent to your inline. In all cases I've checked, the code is either equivalent or better. (ie: addl, subl optimized out) Having ffs() be base 1 is a pest. ffs0() (base 0) would be damn convenient at times, considering the number of places 'ffs(foo) - 1' turns up. > Bruce Cheers, -Peter To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe cvs-all" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?19990819142850.C78D81C9F>