Date: Tue, 10 Feb 2015 10:58:24 -0500 From: John Baldwin <jhb@freebsd.org> To: Bruce Evans <brde@optusnet.com.au> Cc: svn-src-head@freebsd.org, svn-src-all@freebsd.org, src-committers@freebsd.org, Jung-uk Kim <jkim@freebsd.org> Subject: Re: svn commit: r278474 - head/sys/sys Message-ID: <12119175.I8M1urv6pf@ralph.baldwin.cx> In-Reply-To: <20150211014516.N1511@besplex.bde.org> References: <201502092103.t19L3OAn013792@svn.freebsd.org> <54D92CE8.1030803@FreeBSD.org> <20150211014516.N1511@besplex.bde.org>
next in thread | previous in thread | raw e-mail | index | archive | help
On Wednesday, February 11, 2015 02:37:05 AM Bruce Evans wrote: > On Mon, 9 Feb 2015, Jung-uk Kim wrote: > > -----BEGIN PGP SIGNED MESSAGE----- > > Hash: SHA256 > > > > On 02/09/2015 16:08, John Baldwin wrote: > >> On Monday, February 09, 2015 09:03:24 PM John Baldwin wrote: > >>> Author: jhb Date: Mon Feb 9 21:03:23 2015 New Revision: 278474 > >>> URL: https://svnweb.freebsd.org/changeset/base/278474 > >>> > >>> Log: Use __builtin_popcnt() to implement a BIT_COUNT() operation > >>> for bitsets and use this to implement CPU_COUNT() to count the > >>> number of CPUs in a cpuset. > >>> > >>> MFC after: 2 weeks > >> > >> Yes, __builtin_popcnt() works with GCC 4.2. It should also allow > >> the compiler to DTRT in userland uses of this if -msse4.2 is > >> enabled. > > > > Back in 2012, when I submitted a similar patch, bde noted > > __builtin_popcount*() cannot be used with GCC 4.2 for *kernel* because > > it emits a library call. > > (*) Since generic amd64 and i386 have no popcount instruction in hardware, > using builtin popcount rarely uses the hardware instruction (it takes > special -march to get it, and the resulting binaries don't run on generic > CPUs). Thus using the builtin works worse than using the old inline > function in most cases. Except, the old inline function is only > implemented in the kernel, and isn't implemented for 64-bit integers. > > gcc-4.8 generates the hardware popcount if the arch supports it. Only > its library popcounts are slower than clang's. gcc-4.2 presumably > doesn't generate the hardware popcount, since it doesn't have a -march > for newer CPUs that have it. I don't really expect CPU_COUNT() to be used in places where performance is of the utmost importance. (For example in igb I use it in attach to enumerate the set of CPUs to bind queues to, but nowhere else.) I can implement a bitcount64 by using bitcount32 on both halves unless someone has a better suggestion and we can use the bitcount routines instead of __builtin_popcountl in BIT_COUNT() for GCC if we care that strongly about it. Alternatively, I'm happy to implement the libcall for GCC 4.2 for the kernel so that __builtin_popcountl() works. -- John Baldwin
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?12119175.I8M1urv6pf>