Date: Fri, 20 Aug 1999 01:24:08 +0800 From: Peter Wemm <peter@netplex.com.au> To: Bruce Evans <bde@zeta.org.au> Cc: cvs-all@FreeBSD.org, cvs-committers@FreeBSD.org Subject: Re: cvs commit: src/sys/i386/include cpufunc.h Message-ID: <19990819172408.020611C1F@overcee.netplex.com.au> In-Reply-To: Your message of "Fri, 20 Aug 1999 01:52:32 %2B1000." <199908191552.BAA24076@godzilla.zeta.org.au>
next in thread | previous in thread | raw e-mail | index | archive | help
Bruce Evans wrote:
> >> > Modified files:
> >> > sys/i386/include cpufunc.h
> >> > Log:
> >> > Try using the builtin ffs() for egcs, it (by random inspection)
> >> > generates slightly better code and avoids the incl then subl when
> >> > using ffs(foo) - 1.
> >>
> >> The inline asm version of ffs(x) should be implemented as
> >> (x == 0 ? 0 : bsfl(x) + 1). The compiler can then perform all possible
> >> optimisations except ones that use the condition codes delivered by bsfl
> >> (these never seem to help). This gives slightly better code than the
> >> builtin.
> >
> >except the one where I want "bsfl(x)", not "bsfl(x) + 1", With the cpufunc.
h
> >inline, it works out as:
> >
> >testl %eax,%eax; je 1f; bsfl %eax; addl $1,%eax; 1:
> >subl $1,%eax
>
> That's not with the bsfl inline, and it is pessimised for space for some
> reason (addl instead of incl). The bsfl inline expands to one instruction
> (bsfl :-).
I followed up with that before. I learned assembler on systems (eg: m68k)
where small value immediate add/sub was quicker than the inc/dec stuff so I
tend to think of that first.
> >How about this instead:
> >
> >static __inline int
> >__bsfl(int mask)
> >{
> > int result;
> >
> > __asm __volatile("bsfl %0,%0" : "=r" (result) : "0" (mask));
> > return result;
> >}
>
> That is bsfl() :-).
And was kinda intentional. I was hoping to use it in it's own right, but I
wasn't sure about exposing it as a full name or not, or whether to use
ffs0() or something like that instead.
> >static __inline int
> >ffs(int mask)
> >{
> > return mask == 0 ? mask : __bsfl(mask) + 1;
> >}
>
> And that is my expression for ffs() turned into a macro :-). I tested
> with a macro, but of course it would have to be an inline function or
> a gcc expression-statement to handle multiple evaluation issues.
gcc makes worse code for "mask == 0 ? 0 : bsfl(mask) + 1" for some reason.
Explicitly using "mask" in the return means it uses the known-zero register
directly rather than reloading a zero again.
> >However, you're right. builtin_ffs() sucks when the argument is not known
> >ie: leave out the if (j), and it turns into:
> >foo:
> > movl 4(%esp),%eax
> > bsfl %eax,%edx
> > jne .L37
> > movl $-1,%edx
> >.L37:
> > pushl %edx
> > call bar
> > addl $4,%esp
> > ret
>
> gcc likes big instructions like the movl $-1 a bit much, but it's hard to
> do better without an extra jump.
>
> The main point of my original version was to reduce register pressure
> by using only 1 register in ffs(). It's not clear whether the builtin
> needs 2 registers or how the tradeoffs work in practice.
I'm curious how __builtin_ffs() goes on the alpha compared to the libkern
version. I'll have a look shortly.
> >Having ffs() be base 1 is a pest. ffs0() (base 0) would be damn convenient
> >at times, considering the number of places 'ffs(foo) - 1' turns up.
>
> It's annoying to code, but I found that `y += ffs(x);' in a loop tends be
> optimised better than `y += ffs(x) - 1;'. This is because ffs(0) == 0.
> gcc seems to want to optimise the x == 0 case and the code for the x != 0
> case gets disturbed less for `y += ffs(x);'. It does an extra incl
> instead of an extra branch.
I was hoping that gcc would do smarter things once it could see inside the
ffs() routine, and if we could use ffs0() or bsfl() directly instead of
ffs() - 1.
Cheers,
-Peter
To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe cvs-all" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?19990819172408.020611C1F>
