Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 15 Nov 2012 17:53:45 -0500
From:      Jung-uk Kim <jkim@FreeBSD.org>
To:        Bruce Evans <brde@optusnet.com.au>
Cc:        freebsd-arch@FreeBSD.org
Subject:   Re: [RFC] Generic population count function
Message-ID:  <50A57279.3040607@FreeBSD.org>
In-Reply-To: <20121115181009.D42162@besplex.bde.org>
References:  <50A43B52.8030102@FreeBSD.org> <20121115181009.D42162@besplex.bde.org>

next in thread | previous in thread | raw e-mail | index | archive | help
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 2012-11-15 03:32:50 -0500, Bruce Evans wrote:
> The implementation of the ffs() family has similar non-problems.
> No one cares because they are rarely used, but...  cperciva@ wrote
> the lookup table part of the following set of implementations of
> ffs() and fls().  A lookup table is the best general method, though
> it can be beaten by unportable methodw in some cases.  I added more
> methods and a test framework to this (test framework not always
> included).  The methods are: - gcc builtin - cpufunc inline

Talking about cpufunc ffsl(), I found something strange about clang:

% cat ffs.c
#include <sys/types.h>
#include <machine/cpufunc.h>
int ffs_cpufunc(long x) /* copied from cpufunc.h as is */
{ return (x == 0 ? x : (int)bsfq((u_long)x) + 1); }
int ffs_builtin(long x)
{ return (__builtin_ffsl(x)); }

% clang -O2 -c ffs.c
% objdump -d ffs.o

ffs.o:     file format elf64-x86-64-freebsd

Disassembly of section .text:

0000000000000000 <ffs_cpufunc>:
   0:   48 85 ff                test   %rdi,%rdi
   3:   74 21                   je     26 <ffs_cpufunc+0x26>
   5:   48 89 7c 24 f8          mov    %rdi,-0x8(%rsp)
   a:   48 0f bc 4c 24 f8       bsf    -0x8(%rsp),%rcx
  10:   48 c1 e1 20             shl    $0x20,%rcx
  14:   48 b8 00 00 00 00 01    mov    $0x100000000,%rax
  1b:   00 00 00
  1e:   48 01 c8                add    %rcx,%rax
  21:   48 c1 e8 20             shr    $0x20,%rax
  25:   c3                      retq
  26:   31 c0                   xor    %eax,%eax
  28:   c3                      retq
  29:   0f 1f 80 00 00 00 00    nopl   0x0(%rax)

0000000000000030 <ffs_builtin>:
  30:   48 0f bc cf             bsf    %rdi,%rcx
  34:   ff c1                   inc    %ecx
  36:   31 c0                   xor    %eax,%eax
  38:   48 85 ff                test   %rdi,%rdi
  3b:   0f 45 c1                cmovne %ecx,%eax
  3e:   c3                      retq

Note clang generates very inefficient code for ffsl() from cpufunc.h.
 It is using memory, setting an unused bit, etc. :-(

Both GCC 4.2 and 4.8 generate "good code", of course:

% gcc -O2 -c ffs.c
% objdump -d ffs.o

ffs.o:     file format elf64-x86-64-freebsd

Disassembly of section .text:

0000000000000000 <ffs_cpufunc>:
   0:   31 c0                   xor    %eax,%eax
   2:   48 85 ff                test   %rdi,%rdi
   5:   74 07                   je     e <ffs_cpufunc+0xe>
   7:   48 0f bc c7             bsf    %rdi,%rax
   b:   83 c0 01                add    $0x1,%eax
   e:   f3 c3                   repz retq

0000000000000010 <ffs_builtin>:
  10:   48 0f bc ff             bsf    %rdi,%rdi
  14:   48 c7 c0 ff ff ff ff    mov    $0xffffffffffffffff,%rax
  1b:   48 0f 44 f8             cmove  %rax,%rdi
  1f:   48 83 c7 01             add    $0x1,%rdi
  23:   89 f8                   mov    %edi,%eax
  25:   c3                      retq

% gcc48 -O2 -c ffs.c
% objdump -d ffs.o

ffs.o:     file format elf64-x86-64-freebsd

Disassembly of section .text:

0000000000000000 <ffs_cpufunc>:
   0:   31 c0                   xor    %eax,%eax
   2:   48 85 ff                test   %rdi,%rdi
   5:   75 09                   jne    10 <ffs_cpufunc+0x10>
   7:   f3 c3                   repz retq
   9:   0f 1f 80 00 00 00 00    nopl   0x0(%rax)
  10:   48 0f bc c7             bsf    %rdi,%rax
  14:   83 c0 01                add    $0x1,%eax
  17:   c3                      retq
  18:   0f 1f 84 00 00 00 00    nopl   0x0(%rax,%rax,1)
  1f:   00

0000000000000020 <ffs_builtin>:
  20:   48 0f bc c7             bsf    %rdi,%rax
  24:   48 c7 c2 ff ff ff ff    mov    $0xffffffffffffffff,%rdx
  2b:   48 0f 44 c2             cmove  %rdx,%rax
  2f:   48 83 c0 01             add    $0x1,%rax
  33:   c3                      retq

FYI,

Jung-uk Kim
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.19 (FreeBSD)
Comment: Using GnuPG with Mozilla - http://www.enigmail.net/

iEYEARECAAYFAlClcnkACgkQmlay1b9qnVPQYACgjEYpPDgLfkskpm3+2vRNjCuJ
n5AAn30JxrFS9kLoejh+4BZFD/MnfAvT
=4yS7
-----END PGP SIGNATURE-----



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?50A57279.3040607>