Date: Fri, 20 Aug 1999 01:52:32 +1000 From: Bruce Evans <bde@zeta.org.au> To: bde@zeta.org.au, peter@netplex.com.au Cc: cvs-all@FreeBSD.org, cvs-committers@FreeBSD.org Subject: Re: cvs commit: src/sys/i386/include cpufunc.h Message-ID: <199908191552.BAA24076@godzilla.zeta.org.au>
next in thread | raw e-mail | index | archive | help
>> > Modified files: >> > sys/i386/include cpufunc.h >> > Log: >> > Try using the builtin ffs() for egcs, it (by random inspection) >> > generates slightly better code and avoids the incl then subl when >> > using ffs(foo) - 1. >> >> The inline asm version of ffs(x) should be implemented as >> (x == 0 ? 0 : bsfl(x) + 1). The compiler can then perform all possible >> optimisations except ones that use the condition codes delivered by bsfl >> (these never seem to help). This gives slightly better code than the >> builtin. > >except the one where I want "bsfl(x)", not "bsfl(x) + 1", With the cpufunc.h >inline, it works out as: > >testl %eax,%eax; je 1f; bsfl %eax; addl $1,%eax; 1: >subl $1,%eax That's not with the bsfl inline, and it is pessimised for space for some reason (addl instead of incl). The bsfl inline expands to one instruction (bsfl :-). >How about this instead: > >static __inline int >__bsfl(int mask) >{ > int result; > > __asm __volatile("bsfl %0,%0" : "=r" (result) : "0" (mask)); > return result; >} That is bsfl() :-). >static __inline int >ffs(int mask) >{ > return mask == 0 ? mask : __bsfl(mask) + 1; >} And that is my expression for ffs() turned into a macro :-). I tested with a macro, but of course it would have to be an inline function or a gcc expression-statement to handle multiple evaluation issues. >... >However, you're right. builtin_ffs() sucks when the argument is not known >ie: leave out the if (j), and it turns into: >foo: > movl 4(%esp),%eax > bsfl %eax,%edx > jne .L37 > movl $-1,%edx >.L37: > pushl %edx > call bar > addl $4,%esp > ret gcc likes big instructions like the movl $-1 a bit much, but it's hard to do better without an extra jump. The main point of my original version was to reduce register pressure by using only 1 register in ffs(). It's not clear whether the builtin needs 2 registers or how the tradeoffs work in practice. >Having ffs() be base 1 is a pest. ffs0() (base 0) would be damn convenient >at times, considering the number of places 'ffs(foo) - 1' turns up. It's annoying to code, but I found that `y += ffs(x);' in a loop tends be optimised better than `y += ffs(x) - 1;'. This is because ffs(0) == 0. gcc seems to want to optimise the x == 0 case and the code for the x != 0 case gets disturbed less for `y += ffs(x);'. It does an extra incl instead of an extra branch. Bruce To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe cvs-all" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199908191552.BAA24076>