From owner-cvs-all Thu Aug 19 8:52:57 1999 Delivered-To: cvs-all@freebsd.org Received: from godzilla.zeta.org.au (godzilla.zeta.org.au [203.26.10.9]) by hub.freebsd.org (Postfix) with ESMTP id 06EF1151F8; Thu, 19 Aug 1999 08:52:48 -0700 (PDT) (envelope-from bde@godzilla.zeta.org.au) Received: (from bde@localhost) by godzilla.zeta.org.au (8.8.7/8.8.7) id BAA24076; Fri, 20 Aug 1999 01:52:32 +1000 Date: Fri, 20 Aug 1999 01:52:32 +1000 From: Bruce Evans Message-Id: <199908191552.BAA24076@godzilla.zeta.org.au> To: bde@zeta.org.au, peter@netplex.com.au Subject: Re: cvs commit: src/sys/i386/include cpufunc.h Cc: cvs-all@FreeBSD.org, cvs-committers@FreeBSD.org Sender: owner-cvs-all@FreeBSD.ORG Precedence: bulk >> > Modified files: >> > sys/i386/include cpufunc.h >> > Log: >> > Try using the builtin ffs() for egcs, it (by random inspection) >> > generates slightly better code and avoids the incl then subl when >> > using ffs(foo) - 1. >> >> The inline asm version of ffs(x) should be implemented as >> (x == 0 ? 0 : bsfl(x) + 1). The compiler can then perform all possible >> optimisations except ones that use the condition codes delivered by bsfl >> (these never seem to help). This gives slightly better code than the >> builtin. > >except the one where I want "bsfl(x)", not "bsfl(x) + 1", With the cpufunc.h >inline, it works out as: > >testl %eax,%eax; je 1f; bsfl %eax; addl $1,%eax; 1: >subl $1,%eax That's not with the bsfl inline, and it is pessimised for space for some reason (addl instead of incl). The bsfl inline expands to one instruction (bsfl :-). >How about this instead: > >static __inline int >__bsfl(int mask) >{ > int result; > > __asm __volatile("bsfl %0,%0" : "=r" (result) : "0" (mask)); > return result; >} That is bsfl() :-). >static __inline int >ffs(int mask) >{ > return mask == 0 ? mask : __bsfl(mask) + 1; >} And that is my expression for ffs() turned into a macro :-). I tested with a macro, but of course it would have to be an inline function or a gcc expression-statement to handle multiple evaluation issues. >... >However, you're right. builtin_ffs() sucks when the argument is not known >ie: leave out the if (j), and it turns into: >foo: > movl 4(%esp),%eax > bsfl %eax,%edx > jne .L37 > movl $-1,%edx >.L37: > pushl %edx > call bar > addl $4,%esp > ret gcc likes big instructions like the movl $-1 a bit much, but it's hard to do better without an extra jump. The main point of my original version was to reduce register pressure by using only 1 register in ffs(). It's not clear whether the builtin needs 2 registers or how the tradeoffs work in practice. >Having ffs() be base 1 is a pest. ffs0() (base 0) would be damn convenient >at times, considering the number of places 'ffs(foo) - 1' turns up. It's annoying to code, but I found that `y += ffs(x);' in a loop tends be optimised better than `y += ffs(x) - 1;'. This is because ffs(0) == 0. gcc seems to want to optimise the x == 0 case and the code for the x != 0 case gets disturbed less for `y += ffs(x);'. It does an extra incl instead of an extra branch. Bruce To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe cvs-all" in the body of the message