Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 14 Jun 2015 09:59:41 +0800
From:      Erich Dollansky <erichsfreebsdlist@alogt.com>
To:        Richard Yao <ryao@gentoo.org>
Cc:        Hans Petter Selasky <hps@selasky.org>, Ian Lepore <ian@freebsd.org>, "freebsd-hackers@freebsd.org" <freebsd-hackers@FreeBSD.org>
Subject:   Re: allow ffs & co. a binary search
Message-ID:  <20150614095941.667035a6@B85M-HD3-0.alogt.com>
In-Reply-To: <557ADF46.6030004@gentoo.org>
References:  <20150607081315.7c0f09fb@B85M-HD3-0.alogt.com> <5579A016.4040800@gentoo.org> <1434034373.1415.3.camel@freebsd.org> <5579A235.80609@gentoo.org> <557A3CF3.3050603@selasky.org> <557AD8B0.6000600@gentoo.org> <557ADF46.6030004@gentoo.org>

next in thread | previous in thread | raw e-mail | index | archive | help
Hi All,

what should we do now?

There are several approaches now on the table. I would implement this

    use the built-in methods of the compiler when available and
    use the Stanford functions in other cases and
    and keep the current solution as a fall back

I would use macros to determine which solution should be used.

What are your thoughts?

Erich

On Fri, 12 Jun 2015 09:31:50 -0400
Richard Yao <ryao@gentoo.org> wrote:

> On 06/12/2015 09:03 AM, Richard Yao wrote:
> > On 06/11/2015 09:59 PM, Hans Petter Selasky wrote:
> >> On 06/11/15 16:59, Richard Yao wrote:
> >>> On 06/11/2015 10:52 AM, Ian Lepore wrote:
> >>>> On Thu, 2015-06-11 at 10:49 -0400, Richard Yao wrote:
> >>>>> On 06/06/2015 08:13 PM, Erich Dollansky wrote:
> >>>> [...]
> >>>>>
> >>>>> The fastest portable way of calculating highest bit set is to
> >>>>> use a debrujin sequence:
> >>>>>
> >>>>> https://graphics.stanford.edu/~seander/bithacks.html#IntegerLogDeBruijn
> >>>>>
> >>>>> The binary approach is inferior, especially on Pentium 4
> >>>>> processors.
> >>>>
> >>>> And of course it's crucial that we optimize for Pentium 4
> >>>> processors in 2015.
> >>>
> >>> The debrujin approach is fast regardless of the architecture.
> >>> Anyway, this is a solved problem. There is no need to reinvent
> >>> the wheel when the portable solutions are listed on that page at
> >>> Stanford.
> >>>
> >>
> >> Hi Richard,
> >>
> >> ffs() can be implemented like this simply:
> >>
> >> int ffs_A(uint32_t v)
> >> {
> >> static const int MultiplyDeBruijnBitPosition[32] =
> >> {
> >>   0, 9, 1, 10, 13, 21, 2, 29, 11, 14, 16, 18, 22, 25, 3, 30,
> >>   8, 12, 20, 28, 15, 17, 24, 7, 19, 27, 23, 6, 26, 5, 4, 31
> >> };
> >> v = ~(v - 1) & v;
> >> return (MultiplyDeBruijnBitPosition[(uint32_t)(v * 0x07C4ACDDU) >>
> >> 27]); }
> > 
> > There is a way to do that on the stanford page too:
> > 
> > https://graphics.stanford.edu/~seander/bithacks.html#ZerosOnRightMultLookup
> 
> My apologies. I inadvertently linked to the wrong variant of that in
> my initial email. It has been a while since I visited that page and I
> was juggling a few things when I wrote that email, so it was easy to
> confuse the two.
> 
> Anyway, it would be possible to do something like:
> 
> #if defined(__GNUC__) || (defined(__clang__) &&
> __has_builtin(__builtin_ffs))
> # define ffs(n) __builtin_ffs(n)
> #else
> /*
>  * From Sean Eron Anderson's Bit Twiddling Hacks
>  *
> https://graphics.stanford.edu/~seander/bithacks.html#ZerosOnRightMultLookup
>  */
> static inline int
> ffs(uint32_t v)
> {
> static const int MultiplyDeBruijnBitPosition[32] =
> {
>   0, 9, 1, 10, 13, 21, 2, 29, 11, 14, 16, 18, 22, 25, 3, 30,
>   8, 12, 20, 28, 15, 17, 24, 7, 19, 27, 23, 6, 26, 5, 4, 31
> };
> v = ~(v - 1) & v;
> return (MultiplyDeBruijnBitPosition[(uint32_t)(v * 0x07C4ACDDU) >>
> 27]); }
> #endif
> 
> If there were an autotools-like check for `__has_builtin()`, then an
> explicit list of compilers known to support `__builtin_ffs()` would
> not be necessary, although a list of compilers that lack support for
> `__has_builtin()` (such as GCC) would still be necessary. Those
> building with compilers that do not support `__builtin_ffs()` could
> fallback to the DeBrujin approach.
> 




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20150614095941.667035a6>