Date: Thu, 19 Aug 1999 22:28:50 +0800 From: Peter Wemm <peter@netplex.com.au> To: Bruce Evans <bde@zeta.org.au> Cc: cvs-all@FreeBSD.org, cvs-committers@FreeBSD.org Subject: Re: cvs commit: src/sys/i386/include cpufunc.h Message-ID: <19990819142850.C78D81C9F@overcee.netplex.com.au> In-Reply-To: Your message of "Thu, 19 Aug 1999 17:27:33 %2B1000." <199908190727.RAA14796@godzilla.zeta.org.au>
next in thread | previous in thread | raw e-mail | index | archive | help
Bruce Evans wrote:
> > Modified files:
> > sys/i386/include cpufunc.h
> > Log:
> > Try using the builtin ffs() for egcs, it (by random inspection)
> > generates slightly better code and avoids the incl then subl when
> > using ffs(foo) - 1.
>
> The inline asm version of ffs(x) should be implemented as
> (x == 0 ? 0 : bsfl(x) + 1). The compiler can then perform all possible
> optimisations except ones that use the condition codes delivered by bsfl
> (these never seem to help). This gives slightly better code than the
> builtin.
except the one where I want "bsfl(x)", not "bsfl(x) + 1", With the cpufunc.h
inline, it works out as:
testl %eax,%eax; je 1f; bsfl %eax; addl $1,%eax; 1:
subl $1,%eax
ie: it can't optimize out the +1 -1.
How about this instead:
static __inline int
__bsfl(int mask)
{
int result;
__asm __volatile("bsfl %0,%0" : "=r" (result) : "0" (mask));
return result;
}
static __inline int
ffs(int mask)
{
return mask == 0 ? mask : __bsfl(mask) + 1;
}
Then, with the following code:
extern int bar(int);
int foo(int j)
{
int i;
if (j)
bar (ffs(j) - 1);
}
It gets optimized much better:
foo:
movl 4(%esp),%eax
testl %eax,%eax
je .L6
#APP
bsfl %eax,%eax
#NO_APP
pushl %eax
call bar
addl $4,%esp
.L6:
ret
Versus the original inline with your ffs macro:
foo:
movl 4(%esp),%eax
testl %eax,%eax
je .L37
#APP
testl %eax,%eax
je 1f
bsfl %eax,%eax
incl %eax
1:
#NO_APP
decl %eax
pushl %eax
call bar
addl $4,%esp
.L37:
ret
The redundant incl, decl isn't optimized and contains a duplicate (never
taken) test.
Using this same code with builtin_ffs() results in:
foo:
movl 4(%esp),%eax
testl %eax,%eax
je .L37
bsfl %eax,%eax
pushl %eax
call bar
addl $4,%esp
.L37:
ret
However, you're right. builtin_ffs() sucks when the argument is not known
ie: leave out the if (j), and it turns into:
foo:
movl 4(%esp),%eax
bsfl %eax,%edx
jne .L37
movl $-1,%edx
.L37:
pushl %edx
call bar
addl $4,%esp
ret
Which means bsfl is always called even for a zero arg.
Leaving out "if (j)" on my above code with my __bsfl() version results in:
foo:
movl 4(%esp),%eax
testl %eax,%eax
je .L6
#APP
bsfl %eax,%eax
#NO_APP
incl %eax
.L6:
decl %eax
pushl %eax
call bar
addl $4,%esp
ret
.. which is equivalent to your inline. In all cases I've checked, the code
is either equivalent or better. (ie: addl, subl optimized out)
Having ffs() be base 1 is a pest. ffs0() (base 0) would be damn convenient
at times, considering the number of places 'ffs(foo) - 1' turns up.
> Bruce
Cheers,
-Peter
To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe cvs-all" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?19990819142850.C78D81C9F>
