From owner-cvs-src@FreeBSD.ORG Tue Aug 26 03:00:36 2003 Return-Path: Delivered-To: cvs-src@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id BBBDB16A4BF; Tue, 26 Aug 2003 03:00:36 -0700 (PDT) Received: from mailman.zeta.org.au (mailman.zeta.org.au [203.26.10.16]) by mx1.FreeBSD.org (Postfix) with ESMTP id 90B2743FDD; Tue, 26 Aug 2003 03:00:31 -0700 (PDT) (envelope-from bde@zeta.org.au) Received: from gamplex.bde.org (katana.zip.com.au [61.8.7.246]) by mailman.zeta.org.au (8.9.3p2/8.8.7) with ESMTP id UAA27944; Tue, 26 Aug 2003 20:00:22 +1000 Date: Tue, 26 Aug 2003 20:00:20 +1000 (EST) From: Bruce Evans X-X-Sender: bde@gamplex.bde.org To: Marcel Moolenaar In-Reply-To: <20030826082035.GA728@dhcp43.pn.xcllnt.net> Message-ID: <20030826192612.Q9540@gamplex.bde.org> References: <200308250139.h7P1dlwg002331@repoman.freebsd.org> <20030826082035.GA728@dhcp43.pn.xcllnt.net> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII cc: cvs-src@FreeBSD.org cc: src-committers@FreeBSD.org cc: cvs-all@FreeBSD.org Subject: Re: cvs commit: src/sys/gnu/ext2fs ext2_bitops.hext2_linux_balloc.c ext2_linux_ialloc.c ia64-bitops.h X-BeenThere: cvs-src@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: CVS commit messages for the src tree List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 26 Aug 2003 10:00:36 -0000 On Tue, 26 Aug 2003, Marcel Moolenaar wrote: > On Tue, Aug 26, 2003 at 05:44:40PM +1000, Bruce Evans wrote: > > On Sun, 24 Aug 2003, Marcel Moolenaar wrote: > > > ... > > > Log: > > > Change of plans: Add ext2_bitops.h with generic and portable > > > implementations. Use those on platforms that don't have MD > > > ... > > > > Thnaks. I'll deventually do some benchmarks to see if we want the MD > > (inline assembler) versions at all. > > Ok. I didn't write the functions for speed so there may be some > opportunities after analyzing what the compiler made of it. > > BTW: I noticed that GCC has a builtin implementation for ffs(). > It may be beneficial. Is this more than ffs(3)? (It could be for bit strings generally.). I recently benchmarked ffs(3). Too many builtins are turned off by -fno-builtin for the kernel, but on i386's we have an inline asm version of ffs() which is still better than the gcc builtin according to my benchmarks. The main difference is that inline asm version uses a conditional branch while the builtin uses extra instructions to avoid the branch. Somehow the branch was always faster on my test systems (old Celeron, AthlonXP and freefall(?)). I tested with a moving bit to try to defeat branch target caches, but forgot to test with alternating zero and nonzero data which would probably cause more branch mispredictions. Data which is zero much more often than random data (probability 1/2^32) may be unusual anyway. IIRC, the amount by which the extra instructions were slower was quite machine-dependent, with the AthlonXP handling them almost as fast as a branch. The extra instructions look like this: movl $0, %edx bsfl x, %eax # result less 1 if x was nonzero; else garbage sete %dl # begin fixup for garbage case negl %edx orl %edx, %eax # end fixup for garbage case incl %eax # fix up semantic mismatch between bsf and ffs This is highly non-parallelizable, so it is probably as inefficient as it looks to a naive instruction counter. Of course, any optimizations and pessimizations here are only a few cycles, so they don't show up except in micro-benchmarks. Bruce