Date: Sun, 14 Jun 2015 09:59:41 +0800 From: Erich Dollansky <erichsfreebsdlist@alogt.com> To: Richard Yao <ryao@gentoo.org> Cc: Hans Petter Selasky <hps@selasky.org>, Ian Lepore <ian@freebsd.org>, "freebsd-hackers@freebsd.org" <freebsd-hackers@FreeBSD.org> Subject: Re: allow ffs & co. a binary search Message-ID: <20150614095941.667035a6@B85M-HD3-0.alogt.com> In-Reply-To: <557ADF46.6030004@gentoo.org> References: <20150607081315.7c0f09fb@B85M-HD3-0.alogt.com> <5579A016.4040800@gentoo.org> <1434034373.1415.3.camel@freebsd.org> <5579A235.80609@gentoo.org> <557A3CF3.3050603@selasky.org> <557AD8B0.6000600@gentoo.org> <557ADF46.6030004@gentoo.org>
next in thread | previous in thread | raw e-mail | index | archive | help
Hi All, what should we do now? There are several approaches now on the table. I would implement this use the built-in methods of the compiler when available and use the Stanford functions in other cases and and keep the current solution as a fall back I would use macros to determine which solution should be used. What are your thoughts? Erich On Fri, 12 Jun 2015 09:31:50 -0400 Richard Yao <ryao@gentoo.org> wrote: > On 06/12/2015 09:03 AM, Richard Yao wrote: > > On 06/11/2015 09:59 PM, Hans Petter Selasky wrote: > >> On 06/11/15 16:59, Richard Yao wrote: > >>> On 06/11/2015 10:52 AM, Ian Lepore wrote: > >>>> On Thu, 2015-06-11 at 10:49 -0400, Richard Yao wrote: > >>>>> On 06/06/2015 08:13 PM, Erich Dollansky wrote: > >>>> [...] > >>>>> > >>>>> The fastest portable way of calculating highest bit set is to > >>>>> use a debrujin sequence: > >>>>> > >>>>> https://graphics.stanford.edu/~seander/bithacks.html#IntegerLogDeBruijn > >>>>> > >>>>> The binary approach is inferior, especially on Pentium 4 > >>>>> processors. > >>>> > >>>> And of course it's crucial that we optimize for Pentium 4 > >>>> processors in 2015. > >>> > >>> The debrujin approach is fast regardless of the architecture. > >>> Anyway, this is a solved problem. There is no need to reinvent > >>> the wheel when the portable solutions are listed on that page at > >>> Stanford. > >>> > >> > >> Hi Richard, > >> > >> ffs() can be implemented like this simply: > >> > >> int ffs_A(uint32_t v) > >> { > >> static const int MultiplyDeBruijnBitPosition[32] = > >> { > >> 0, 9, 1, 10, 13, 21, 2, 29, 11, 14, 16, 18, 22, 25, 3, 30, > >> 8, 12, 20, 28, 15, 17, 24, 7, 19, 27, 23, 6, 26, 5, 4, 31 > >> }; > >> v = ~(v - 1) & v; > >> return (MultiplyDeBruijnBitPosition[(uint32_t)(v * 0x07C4ACDDU) >> > >> 27]); } > > > > There is a way to do that on the stanford page too: > > > > https://graphics.stanford.edu/~seander/bithacks.html#ZerosOnRightMultLookup > > My apologies. I inadvertently linked to the wrong variant of that in > my initial email. It has been a while since I visited that page and I > was juggling a few things when I wrote that email, so it was easy to > confuse the two. > > Anyway, it would be possible to do something like: > > #if defined(__GNUC__) || (defined(__clang__) && > __has_builtin(__builtin_ffs)) > # define ffs(n) __builtin_ffs(n) > #else > /* > * From Sean Eron Anderson's Bit Twiddling Hacks > * > https://graphics.stanford.edu/~seander/bithacks.html#ZerosOnRightMultLookup > */ > static inline int > ffs(uint32_t v) > { > static const int MultiplyDeBruijnBitPosition[32] = > { > 0, 9, 1, 10, 13, 21, 2, 29, 11, 14, 16, 18, 22, 25, 3, 30, > 8, 12, 20, 28, 15, 17, 24, 7, 19, 27, 23, 6, 26, 5, 4, 31 > }; > v = ~(v - 1) & v; > return (MultiplyDeBruijnBitPosition[(uint32_t)(v * 0x07C4ACDDU) >> > 27]); } > #endif > > If there were an autotools-like check for `__has_builtin()`, then an > explicit list of compilers known to support `__builtin_ffs()` would > not be necessary, although a list of compilers that lack support for > `__has_builtin()` (such as GCC) would still be necessary. Those > building with compilers that do not support `__builtin_ffs()` could > fallback to the DeBrujin approach. >
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20150614095941.667035a6>