Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 21 Mar 2015 12:04:41 -0400
From:      John Baldwin <jhb@FreeBSD.org>
To:        Konstantin Belousov <kostikbel@gmail.com>
Cc:        svn-src-head@freebsd.org, svn-src-all@freebsd.org, src-committers@freebsd.org
Subject:   Re: svn commit: r280279 - head/sys/sys
Message-ID:  <550D9699.7070703@FreeBSD.org>
In-Reply-To: <20150320130216.GS2379@kib.kiev.ua>
References:  <201503201027.t2KAR6Ze053047@svn.freebsd.org> <20150320130216.GS2379@kib.kiev.ua>

next in thread | previous in thread | raw e-mail | index | archive | help
On 3/20/15 9:02 AM, Konstantin Belousov wrote:
> On Fri, Mar 20, 2015 at 10:27:06AM +0000, John Baldwin wrote:
>> Author: jhb
>> Date: Fri Mar 20 10:27:06 2015
>> New Revision: 280279
>> URL: https://svnweb.freebsd.org/changeset/base/280279
>>
>> Log:
>>   Expand the bitcount* API to support 64-bit integers, plain ints and longs
>>   and create a "hidden" API that can be used in other system headers without
>>   adding namespace pollution.
>>   - If the POPCNT instruction is enabled at compile time, use
>>     __builtin_popcount*() to implement __bitcount*(), otherwise fall back
>>     to software implementations.
> Are you aware of the Haswell errata HSD146 ?  I see the described behaviour
> on machines back to SandyBridge, but not on Nehalems.
> HSD146.   POPCNT Instruction May Take Longer to Execute Than Expected
> Problem: POPCNT instruction execution with a 32 or 64 bit operand may be
> delayed until previous non-dependent instructions have executed.
> 
> Jilles noted that gcc head and 4.9.2 already provides a workaround by
> xoring the dst register.  I have some patch for amd64 pmap, see the end
> of the message.

No, I was not aware, but I think it's hard to fix this anywhere but the
compiler.  I set CPUTYPE in src.conf on my Ivy Bridge desktop and clang
uses POPCOUNT for this function from ACPI-CA:

static UINT8
AcpiRsCountSetBits (
    UINT16                  BitField)
{
    UINT8                   BitsSet;


    ACPI_FUNCTION_ENTRY ();


    for (BitsSet = 0; BitField; BitsSet++)
    {
        /* Zero the least significant bit that is set */

        BitField &= (UINT16) (BitField - 1);
    }

    return (BitsSet);
}

(I ran into this accidentally because a kernel built on my system failed
to boot in older qemu because the kernel paniced with an illegal instruction
fault in this function.)

There's no way we can preemptively locate every bit of C that clang might
decide to replace with popcount and fix them to employ a workaround.  I think
the only viable solution is to use "-mno-popcount" on all compilers that aren't
known to be fixed.

In regards to whether or not this should be a dynamic test instead, it seems a
bit much to require a dynamic check for code that has to work in both userland
and the kernel, and I'm not sure if something like CPU_COUNT() would actually
benefit from that versus just always using the software solution instead.

-- 
John Baldwin



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?550D9699.7070703>