FreeBSD Mail Archives

Date:      Fri, 20 Aug 1999 01:24:08 +0800
From:      Peter Wemm <peter@netplex.com.au>
To:        Bruce Evans <bde@zeta.org.au>
Cc:        cvs-all@FreeBSD.org, cvs-committers@FreeBSD.org
Subject:   Re: cvs commit: src/sys/i386/include cpufunc.h 
Message-ID:  <19990819172408.020611C1F@overcee.netplex.com.au>
In-Reply-To: Your message of "Fri, 20 Aug 1999 01:52:32 %2B1000." <199908191552.BAA24076@godzilla.zeta.org.au>

Bruce Evans wrote:
> >> >  Modified files:
> >> >    sys/i386/include     cpufunc.h 
> >> >  Log:
> >> >  Try using the builtin ffs() for egcs, it (by random inspection)
> >> >  generates slightly better code and avoids the incl then subl when
> >> >  using ffs(foo) - 1.
> >> 
> >> The inline asm version of ffs(x) should be implemented as
> >> (x == 0 ? 0 : bsfl(x) + 1).  The compiler can then perform all possible
> >> optimisations except ones that use the condition codes delivered by bsfl
> >> (these never seem to help).  This gives slightly better code than the
> >> builtin.
> >
> >except the one where I want "bsfl(x)", not "bsfl(x) + 1",  With the cpufunc.
    h
> >inline, it works out as:
> >
> >testl %eax,%eax; je 1f; bsfl %eax; addl $1,%eax; 1:
> >subl $1,%eax
> 
> That's not with the bsfl inline, and it is pessimised for space for some
> reason (addl instead of incl).  The bsfl inline expands to one instruction
> (bsfl :-).

I followed up with that before.  I learned assembler on systems (eg: m68k)
where small value immediate add/sub was quicker than the inc/dec stuff so I
tend to think of that first.

> >How about this instead:
> >
> >static __inline int
> >__bsfl(int mask)
> >{
> >        int result;
> >
> >        __asm __volatile("bsfl %0,%0" : "=r" (result) : "0" (mask));
> >        return result;
> >}
> 
> That is bsfl() :-).

And was kinda intentional.  I was hoping to use it in it's own right, but I
wasn't sure about exposing it as a full name or not, or whether to use
ffs0() or something like that instead.

> >static __inline int
> >ffs(int mask)
> >{
> >        return mask == 0 ? mask : __bsfl(mask) + 1;
> >}
> 
> And that is my expression for ffs() turned into a macro :-).  I tested
> with a macro, but of course it would have to be an inline function or
> a gcc expression-statement to handle multiple evaluation issues.

gcc makes worse code for "mask == 0 ? 0 : bsfl(mask) + 1" for some reason.
Explicitly using "mask" in the return means it uses the known-zero register
directly rather than reloading a zero again.

> >However, you're right. builtin_ffs() sucks when the argument is not known
> >ie: leave out the if (j), and it turns into:
> >foo:
> >        movl 4(%esp),%eax
> >        bsfl %eax,%edx
> >        jne .L37
> >        movl $-1,%edx
> >.L37:
> >        pushl %edx
> >        call bar
> >        addl $4,%esp
> >        ret
> 
> gcc likes big instructions like the movl $-1 a bit much, but it's hard to
> do better without an extra jump.
> 
> The main point of my original version was to reduce register pressure
> by using only 1 register in ffs().  It's not clear whether the builtin
> needs 2 registers or how the tradeoffs work in practice.

I'm curious how __builtin_ffs() goes on the alpha compared to the libkern
version.  I'll have a look shortly.

> >Having ffs() be base 1 is a pest.  ffs0() (base 0) would be damn convenient
> >at times, considering the number of places 'ffs(foo) - 1' turns up.
> 
> It's annoying to code, but I found that `y += ffs(x);' in a loop tends be
> optimised better than `y += ffs(x) - 1;'.   This is because ffs(0) == 0.
> gcc seems to want to optimise the x == 0 case and the code for the x != 0
> case gets disturbed less for `y += ffs(x);'.  It does an extra incl
> instead of an extra branch.

I was hoping that gcc would do smarter things once it could see inside the
ffs() routine, and if we could use ffs0() or bsfl() directly instead of
ffs() - 1.

Cheers,
-Peter



To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe cvs-all" in the body of the message

Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?19990819172408.020611C1F>

Header And Logo

Peripheral Links

Site Navigation

Header And Logo

Peripheral Links

Search

Site Navigation