Date: Fri, 20 Aug 1999 01:52:32 +1000 From: Bruce Evans <bde@zeta.org.au> To: bde@zeta.org.au, peter@netplex.com.au Cc: cvs-all@FreeBSD.org, cvs-committers@FreeBSD.org Subject: Re: cvs commit: src/sys/i386/include cpufunc.h Message-ID: <199908191552.BAA24076@godzilla.zeta.org.au>
next in thread | raw e-mail | index | archive | help
>> > Modified files:
>> > sys/i386/include cpufunc.h
>> > Log:
>> > Try using the builtin ffs() for egcs, it (by random inspection)
>> > generates slightly better code and avoids the incl then subl when
>> > using ffs(foo) - 1.
>>
>> The inline asm version of ffs(x) should be implemented as
>> (x == 0 ? 0 : bsfl(x) + 1). The compiler can then perform all possible
>> optimisations except ones that use the condition codes delivered by bsfl
>> (these never seem to help). This gives slightly better code than the
>> builtin.
>
>except the one where I want "bsfl(x)", not "bsfl(x) + 1", With the cpufunc.h
>inline, it works out as:
>
>testl %eax,%eax; je 1f; bsfl %eax; addl $1,%eax; 1:
>subl $1,%eax
That's not with the bsfl inline, and it is pessimised for space for some
reason (addl instead of incl). The bsfl inline expands to one instruction
(bsfl :-).
>How about this instead:
>
>static __inline int
>__bsfl(int mask)
>{
> int result;
>
> __asm __volatile("bsfl %0,%0" : "=r" (result) : "0" (mask));
> return result;
>}
That is bsfl() :-).
>static __inline int
>ffs(int mask)
>{
> return mask == 0 ? mask : __bsfl(mask) + 1;
>}
And that is my expression for ffs() turned into a macro :-). I tested
with a macro, but of course it would have to be an inline function or
a gcc expression-statement to handle multiple evaluation issues.
>...
>However, you're right. builtin_ffs() sucks when the argument is not known
>ie: leave out the if (j), and it turns into:
>foo:
> movl 4(%esp),%eax
> bsfl %eax,%edx
> jne .L37
> movl $-1,%edx
>.L37:
> pushl %edx
> call bar
> addl $4,%esp
> ret
gcc likes big instructions like the movl $-1 a bit much, but it's hard to
do better without an extra jump.
The main point of my original version was to reduce register pressure
by using only 1 register in ffs(). It's not clear whether the builtin
needs 2 registers or how the tradeoffs work in practice.
>Having ffs() be base 1 is a pest. ffs0() (base 0) would be damn convenient
>at times, considering the number of places 'ffs(foo) - 1' turns up.
It's annoying to code, but I found that `y += ffs(x);' in a loop tends be
optimised better than `y += ffs(x) - 1;'. This is because ffs(0) == 0.
gcc seems to want to optimise the x == 0 case and the code for the x != 0
case gets disturbed less for `y += ffs(x);'. It does an extra incl
instead of an extra branch.
Bruce
To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe cvs-all" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199908191552.BAA24076>
