From owner-svn-src-all@FreeBSD.ORG Mon Mar 30 17:24:41 2015 Return-Path: Delivered-To: svn-src-all@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [8.8.178.115]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 64BE3D75; Mon, 30 Mar 2015 17:24:41 +0000 (UTC) Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 06B673F0; Mon, 30 Mar 2015 17:24:40 +0000 (UTC) Received: from tom.home (kostik@localhost [127.0.0.1]) by kib.kiev.ua (8.14.9/8.14.9) with ESMTP id t2UHOYgU076412 (version=TLSv1/SSLv3 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NO); Mon, 30 Mar 2015 20:24:34 +0300 (EEST) (envelope-from kostikbel@gmail.com) DKIM-Filter: OpenDKIM Filter v2.9.2 kib.kiev.ua t2UHOYgU076412 Received: (from kostik@localhost) by tom.home (8.14.9/8.14.9/Submit) id t2UHOYiv076411; Mon, 30 Mar 2015 20:24:34 +0300 (EEST) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com using -f Date: Mon, 30 Mar 2015 20:24:34 +0300 From: Konstantin Belousov To: John Baldwin Subject: Re: svn commit: r280279 - head/sys/sys Message-ID: <20150330172434.GG2379@kib.kiev.ua> References: <201503201027.t2KAR6Ze053047@svn.freebsd.org> <20150322080015.O955@besplex.bde.org> <20150322093251.GY2379@kib.kiev.ua> <2526359.g5B2nXdKeQ@ralph.baldwin.cx> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <2526359.g5B2nXdKeQ@ralph.baldwin.cx> User-Agent: Mutt/1.5.23 (2014-03-12) X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00, DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,NML_ADSP_CUSTOM_MED autolearn=no autolearn_force=no version=3.4.0 X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on tom.home Cc: svn-src-head@freebsd.org, svn-src-all@freebsd.org, src-committers@freebsd.org, Bruce Evans X-BeenThere: svn-src-all@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: "SVN commit messages for the entire src tree \(except for " user" and " projects" \)" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 30 Mar 2015 17:24:41 -0000 On Mon, Mar 30, 2015 at 11:57:08AM -0400, John Baldwin wrote: > On Sunday, March 22, 2015 11:32:51 AM Konstantin Belousov wrote: > > On Sun, Mar 22, 2015 at 09:41:53AM +1100, Bruce Evans wrote: > > > Always using new API would lose the micro-optimizations given by the runtime > > > decision for default CFLAGS (used by distributions for portability). To > > > keep them, it seems best to keep the inline asm but replace > > > popcnt_pc_map_elem(elem) by __bitcount64(elem). -mno-popcount can then > > > be used to work around slowness in the software (that is actually > > > hardware) case. > > > > So anybody has to compile his own kernel to get popcnt optimization ? > > We do care about trivial things that improve time. > > That is not what Bruce said. He suggested using bitcount64() for the fallback > if the cpuid check fails. He did not say to remove the runtime check to use > popcnt if it is available: > > "Always using [bitcount64] would lose the micro-optimization... [to] keep > [it], it seems best to keep the inline asm but replace popcnt_pc_map_elem(elem) > by [bitcount64(elem)]." Ok, thank you for the clarification. I updated the pmap patch, see the end of the message. > > > BTW, I have the following WIP change, which popcnt xorl is a piece of. > > It emulates the ifuncs with some preprocessing mess. It is much better > > than runtime patching, and is a prerequisite to properly support more > > things, like SMAP. I did not published it earlier, since I wanted to > > convert TLB flush code to this. > > This looks fine to me. It seems to be manually converting certain symbols > to use a dynamic lookup that must be explicitly resolved before first > use? I am not sure what do you mean by dynamic lookup, but possibly it was mentioned. I can emulate the ifuncs more sincerely, by requiring a resolver function, which is called on the first real function invocation. I did not see it as very useful, but it is definitely doable. diff --git a/sys/amd64/amd64/pmap.c b/sys/amd64/amd64/pmap.c index 6a4077c..fcfba56 100644 --- a/sys/amd64/amd64/pmap.c +++ b/sys/amd64/amd64/pmap.c @@ -412,7 +416,7 @@ static caddr_t crashdumpmap; static void free_pv_chunk(struct pv_chunk *pc); static void free_pv_entry(pmap_t pmap, pv_entry_t pv); static pv_entry_t get_pv_entry(pmap_t pmap, struct rwlock **lockp); -static int popcnt_pc_map_elem(uint64_t elem); +static int popcnt_pc_map_elem_pq(uint64_t elem); static vm_page_t reclaim_pv_chunk(pmap_t locked_pmap, struct rwlock **lockp); static void reserve_pv_entries(pmap_t pmap, int needed, struct rwlock **lockp); @@ -2980,20 +3002,27 @@ retry: /* * Returns the number of one bits within the given PV chunk map element. + * + * The erratas for Intel processors state that "POPCNT Instruction May + * Take Longer to Execute Than Expected". It is believed that the + * issue is the spurious dependency on the destination register. + * Provide a hint to the register rename logic that the destination + * value is overwritten, by clearing it, as suggested in the + * optimization manual. It should be cheap for unaffected processors + * as well. + * + * Reference numbers for erratas are + * 4th Gen Core: HSD146 + * 5th Gen Core: BDM85 */ static int -popcnt_pc_map_elem(uint64_t elem) +popcnt_pc_map_elem_pq(uint64_t elem) { - int count; + u_long result; - /* - * This simple method of counting the one bits performs well because - * the given element typically contains more zero bits than one bits. - */ - count = 0; - for (; elem != 0; elem &= elem - 1) - count++; - return (count); + __asm __volatile("xorl %k0,%k0;popcntq %1,%0" + : "=&r" (result) : "rm" (elem)); + return (result); } /* @@ -3025,13 +3054,13 @@ retry: avail = 0; TAILQ_FOREACH(pc, &pmap->pm_pvchunk, pc_list) { if ((cpu_feature2 & CPUID2_POPCNT) == 0) { - free = popcnt_pc_map_elem(pc->pc_map[0]); - free += popcnt_pc_map_elem(pc->pc_map[1]); - free += popcnt_pc_map_elem(pc->pc_map[2]); + free = bitcount64(pc->pc_map[0]); + free += bitcount64(pc->pc_map[1]); + free += bitcount64(pc->pc_map[2]); } else { - free = popcntq(pc->pc_map[0]); - free += popcntq(pc->pc_map[1]); - free += popcntq(pc->pc_map[2]); + free = popcnt_pc_map_elem_pq(pc->pc_map[0]); + free += popcnt_pc_map_elem_pq(pc->pc_map[1]); + free += popcnt_pc_map_elem_pq(pc->pc_map[2]); } if (free == 0) break;