Date: Fri, 20 Aug 1999 01:24:08 +0800 From: Peter Wemm <peter@netplex.com.au> To: Bruce Evans <bde@zeta.org.au> Cc: cvs-all@FreeBSD.org, cvs-committers@FreeBSD.org Subject: Re: cvs commit: src/sys/i386/include cpufunc.h Message-ID: <19990819172408.020611C1F@overcee.netplex.com.au> In-Reply-To: Your message of "Fri, 20 Aug 1999 01:52:32 %2B1000." <199908191552.BAA24076@godzilla.zeta.org.au>
next in thread | previous in thread | raw e-mail | index | archive | help
Bruce Evans wrote: > >> > Modified files: > >> > sys/i386/include cpufunc.h > >> > Log: > >> > Try using the builtin ffs() for egcs, it (by random inspection) > >> > generates slightly better code and avoids the incl then subl when > >> > using ffs(foo) - 1. > >> > >> The inline asm version of ffs(x) should be implemented as > >> (x == 0 ? 0 : bsfl(x) + 1). The compiler can then perform all possible > >> optimisations except ones that use the condition codes delivered by bsfl > >> (these never seem to help). This gives slightly better code than the > >> builtin. > > > >except the one where I want "bsfl(x)", not "bsfl(x) + 1", With the cpufunc. h > >inline, it works out as: > > > >testl %eax,%eax; je 1f; bsfl %eax; addl $1,%eax; 1: > >subl $1,%eax > > That's not with the bsfl inline, and it is pessimised for space for some > reason (addl instead of incl). The bsfl inline expands to one instruction > (bsfl :-). I followed up with that before. I learned assembler on systems (eg: m68k) where small value immediate add/sub was quicker than the inc/dec stuff so I tend to think of that first. > >How about this instead: > > > >static __inline int > >__bsfl(int mask) > >{ > > int result; > > > > __asm __volatile("bsfl %0,%0" : "=r" (result) : "0" (mask)); > > return result; > >} > > That is bsfl() :-). And was kinda intentional. I was hoping to use it in it's own right, but I wasn't sure about exposing it as a full name or not, or whether to use ffs0() or something like that instead. > >static __inline int > >ffs(int mask) > >{ > > return mask == 0 ? mask : __bsfl(mask) + 1; > >} > > And that is my expression for ffs() turned into a macro :-). I tested > with a macro, but of course it would have to be an inline function or > a gcc expression-statement to handle multiple evaluation issues. gcc makes worse code for "mask == 0 ? 0 : bsfl(mask) + 1" for some reason. Explicitly using "mask" in the return means it uses the known-zero register directly rather than reloading a zero again. > >However, you're right. builtin_ffs() sucks when the argument is not known > >ie: leave out the if (j), and it turns into: > >foo: > > movl 4(%esp),%eax > > bsfl %eax,%edx > > jne .L37 > > movl $-1,%edx > >.L37: > > pushl %edx > > call bar > > addl $4,%esp > > ret > > gcc likes big instructions like the movl $-1 a bit much, but it's hard to > do better without an extra jump. > > The main point of my original version was to reduce register pressure > by using only 1 register in ffs(). It's not clear whether the builtin > needs 2 registers or how the tradeoffs work in practice. I'm curious how __builtin_ffs() goes on the alpha compared to the libkern version. I'll have a look shortly. > >Having ffs() be base 1 is a pest. ffs0() (base 0) would be damn convenient > >at times, considering the number of places 'ffs(foo) - 1' turns up. > > It's annoying to code, but I found that `y += ffs(x);' in a loop tends be > optimised better than `y += ffs(x) - 1;'. This is because ffs(0) == 0. > gcc seems to want to optimise the x == 0 case and the code for the x != 0 > case gets disturbed less for `y += ffs(x);'. It does an extra incl > instead of an extra branch. I was hoping that gcc would do smarter things once it could see inside the ffs() routine, and if we could use ffs0() or bsfl() directly instead of ffs() - 1. Cheers, -Peter To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe cvs-all" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?19990819172408.020611C1F>