Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 12 Jun 2015 09:03:44 -0400
From:      Richard Yao <ryao@gentoo.org>
To:        Hans Petter Selasky <hps@selasky.org>, Ian Lepore <ian@freebsd.org>
Cc:        "freebsd-hackers@freebsd.org" <freebsd-hackers@FreeBSD.org>,  Erich Dollansky <erichsfreebsdlist@alogt.com>
Subject:   Re: allow ffs & co. a binary search
Message-ID:  <557AD8B0.6000600@gentoo.org>
In-Reply-To: <557A3CF3.3050603@selasky.org>
References:  <20150607081315.7c0f09fb@B85M-HD3-0.alogt.com>	 <5579A016.4040800@gentoo.org> <1434034373.1415.3.camel@freebsd.org> <5579A235.80609@gentoo.org> <557A3CF3.3050603@selasky.org>

next in thread | previous in thread | raw e-mail | index | archive | help
This is an OpenPGP/MIME signed message (RFC 4880 and 3156)
--U4JivMVLK5eQHd2h7wGPxDqk94kmV7Ki8
Content-Type: text/plain; charset=windows-1252
Content-Transfer-Encoding: quoted-printable

On 06/11/2015 09:59 PM, Hans Petter Selasky wrote:
> On 06/11/15 16:59, Richard Yao wrote:
>> On 06/11/2015 10:52 AM, Ian Lepore wrote:
>>> On Thu, 2015-06-11 at 10:49 -0400, Richard Yao wrote:
>>>> On 06/06/2015 08:13 PM, Erich Dollansky wrote:
>>> [...]
>>>>
>>>> The fastest portable way of calculating highest bit set is to use a
>>>> debrujin sequence:
>>>>
>>>> https://graphics.stanford.edu/~seander/bithacks.html#IntegerLogDeBru=
ijn
>>>>
>>>> The binary approach is inferior, especially on Pentium 4 processors.=

>>>
>>> And of course it's crucial that we optimize for Pentium 4 processors =
in
>>> 2015.
>>
>> The debrujin approach is fast regardless of the architecture. Anyway,
>> this is a solved problem. There is no need to reinvent the wheel when
>> the portable solutions are listed on that page at Stanford.
>>
>=20
> Hi Richard,
>=20
> ffs() can be implemented like this simply:
>=20
> int ffs_A(uint32_t v)
> {
> static const int MultiplyDeBruijnBitPosition[32] =3D
> {
>   0, 9, 1, 10, 13, 21, 2, 29, 11, 14, 16, 18, 22, 25, 3, 30,
>   8, 12, 20, 28, 15, 17, 24, 7, 19, 27, 23, 6, 26, 5, 4, 31
> };
> v =3D ~(v - 1) & v;
> return (MultiplyDeBruijnBitPosition[(uint32_t)(v * 0x07C4ACDDU) >> 27])=
;
> }

There is a way to do that on the stanford page too:

https://graphics.stanford.edu/~seander/bithacks.html#ZerosOnRightMultLook=
up

Using it, we would have:

int ffs_A(uint32_t v)
{
unsigned int v;  // find the number of trailing zeros in 32-bit v
int r;           // result goes here
static const int MultiplyDeBruijnBitPosition[32] =3D
{
  0, 1, 28, 2, 29, 14, 24, 3, 30, 22, 20, 15, 25, 17, 4, 8,
  31, 27, 13, 23, 21, 19, 16, 7, 26, 12, 18, 6, 11, 5, 10, 9
};
return MultiplyDeBruijnBitPosition[((uint32_t)((v & -v) * 0x077CB531U))
>> 27];
}

That accomplishes ffs() in 1 less operation.

These techniques can be extended to 64-bit too.

> --HPS



--U4JivMVLK5eQHd2h7wGPxDqk94kmV7Ki8
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: OpenPGP digital signature
Content-Disposition: attachment; filename="signature.asc"

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2

iQIcBAEBAgAGBQJVeti1AAoJECDuEZm+6Exks44P/RRviCqqLvsltiZYDM4rUMRH
hWj6idmPn3O2xOdng9D802QgBzmemT40tP9O2SAJYI3x49VU3bU5liX6IRBlB//J
cF/cq5WCLrPufO2zRLIRbY+/YwSXBxegbJE8EPDCIFTPYsj3XAxdWW1xgNoKx4Ve
/y7sz7erk9UhANvTjonREtLeH5gpzLIsjL/F1qAWfIXWZn0z5N2XGqAB+cY25MMl
q6jcQTwnwy/h6wjnRZQu6hMossDyH3cZdonXPybBX7xNJOImuiEaPvkFc+2N6Oir
H3x5rHFvouRT8/BlSE6bM3O3Shw8tLWBdzlQlfgzNjN3m36Kje/OqUiSXlW6Z/bY
a8a2lPaYbLhOF+fWmyGd8+zZz9Lg/DSstjFRyaTPAofnfxG0hmGC4AQDtJHXQZsA
C/z2zAvB4crHXIMnb53dAx2T5BXZGmpyk1T28O6yov3glj/DzHk06ybJvPIvBDru
vY2VuTUW/TkxqJFiCl5w04C+f708UCJaFgeNWnvYfoMolEU9I5vy83gidr3Eyulh
BlGGwi34or2yzBltfD+Q4JS4p0Hy55t6nRLzUHm/9f10LdAWdlvMZbsX4fNIV70u
65splGzN6twmJrOWaGqQE8zQKHoT7o1noCvqDJ2i3Ksdk218pN9K8R3FSoa+PU2i
aHvni+pptB/x3/upn3hh
=qSqX
-----END PGP SIGNATURE-----

--U4JivMVLK5eQHd2h7wGPxDqk94kmV7Ki8--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?557AD8B0.6000600>