From owner-cvs-all  Thu Aug 19  8:52:57 1999
Delivered-To: cvs-all@freebsd.org
Received: from godzilla.zeta.org.au (godzilla.zeta.org.au [203.26.10.9])
	by hub.freebsd.org (Postfix) with ESMTP
	id 06EF1151F8; Thu, 19 Aug 1999 08:52:48 -0700 (PDT)
	(envelope-from bde@godzilla.zeta.org.au)
Received: (from bde@localhost)
	by godzilla.zeta.org.au (8.8.7/8.8.7) id BAA24076;
	Fri, 20 Aug 1999 01:52:32 +1000
Date: Fri, 20 Aug 1999 01:52:32 +1000
From: Bruce Evans <bde@zeta.org.au>
Message-Id: <199908191552.BAA24076@godzilla.zeta.org.au>
To: bde@zeta.org.au, peter@netplex.com.au
Subject: Re: cvs commit: src/sys/i386/include cpufunc.h
Cc: cvs-all@FreeBSD.org, cvs-committers@FreeBSD.org
Sender: owner-cvs-all@FreeBSD.ORG
Precedence: bulk

>> >  Modified files:
>> >    sys/i386/include     cpufunc.h 
>> >  Log:
>> >  Try using the builtin ffs() for egcs, it (by random inspection)
>> >  generates slightly better code and avoids the incl then subl when
>> >  using ffs(foo) - 1.
>> 
>> The inline asm version of ffs(x) should be implemented as
>> (x == 0 ? 0 : bsfl(x) + 1).  The compiler can then perform all possible
>> optimisations except ones that use the condition codes delivered by bsfl
>> (these never seem to help).  This gives slightly better code than the
>> builtin.
>
>except the one where I want "bsfl(x)", not "bsfl(x) + 1",  With the cpufunc.h
>inline, it works out as:
>
>testl %eax,%eax; je 1f; bsfl %eax; addl $1,%eax; 1:
>subl $1,%eax

That's not with the bsfl inline, and it is pessimised for space for some
reason (addl instead of incl).  The bsfl inline expands to one instruction
(bsfl :-).

>How about this instead:
>
>static __inline int
>__bsfl(int mask)
>{
>        int result;
>
>        __asm __volatile("bsfl %0,%0" : "=r" (result) : "0" (mask));
>        return result;
>}

That is bsfl() :-).

>static __inline int
>ffs(int mask)
>{
>        return mask == 0 ? mask : __bsfl(mask) + 1;
>}

And that is my expression for ffs() turned into a macro :-).  I tested
with a macro, but of course it would have to be an inline function or
a gcc expression-statement to handle multiple evaluation issues.

>...

>However, you're right. builtin_ffs() sucks when the argument is not known
>ie: leave out the if (j), and it turns into:
>foo:
>        movl 4(%esp),%eax
>        bsfl %eax,%edx
>        jne .L37
>        movl $-1,%edx
>.L37:
>        pushl %edx
>        call bar
>        addl $4,%esp
>        ret

gcc likes big instructions like the movl $-1 a bit much, but it's hard to
do better without an extra jump.

The main point of my original version was to reduce register pressure
by using only 1 register in ffs().  It's not clear whether the builtin
needs 2 registers or how the tradeoffs work in practice.

>Having ffs() be base 1 is a pest.  ffs0() (base 0) would be damn convenient
>at times, considering the number of places 'ffs(foo) - 1' turns up.

It's annoying to code, but I found that `y += ffs(x);' in a loop tends be
optimised better than `y += ffs(x) - 1;'.   This is because ffs(0) == 0.
gcc seems to want to optimise the x == 0 case and the code for the x != 0
case gets disturbed less for `y += ffs(x);'.  It does an extra incl
instead of an extra branch.

Bruce


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe cvs-all" in the body of the message