From owner-svn-src-all@FreeBSD.ORG Tue Mar 31 01:44:09 2015 Return-Path: Delivered-To: svn-src-all@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 5205929C; Tue, 31 Mar 2015 01:44:09 +0000 (UTC) Received: from svn.freebsd.org (svn.freebsd.org [IPv6:2001:1900:2254:2068::e6a:0]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 331AFAAD; Tue, 31 Mar 2015 01:44:09 +0000 (UTC) Received: from svn.freebsd.org ([127.0.1.70]) by svn.freebsd.org (8.14.9/8.14.9) with ESMTP id t2V1i89n056730; Tue, 31 Mar 2015 01:44:08 GMT (envelope-from kib@FreeBSD.org) Received: (from kib@localhost) by svn.freebsd.org (8.14.9/8.14.9/Submit) id t2V1i8cq056729; Tue, 31 Mar 2015 01:44:08 GMT (envelope-from kib@FreeBSD.org) Message-Id: <201503310144.t2V1i8cq056729@svn.freebsd.org> X-Authentication-Warning: svn.freebsd.org: kib set sender to kib@FreeBSD.org using -f From: Konstantin Belousov Date: Tue, 31 Mar 2015 01:44:08 +0000 (UTC) To: src-committers@freebsd.org, svn-src-all@freebsd.org, svn-src-head@freebsd.org Subject: svn commit: r280880 - head/sys/amd64/amd64 X-SVN-Group: head MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-BeenThere: svn-src-all@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: "SVN commit messages for the entire src tree \(except for " user" and " projects" \)" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 31 Mar 2015 01:44:09 -0000 Author: kib Date: Tue Mar 31 01:44:07 2015 New Revision: 280880 URL: https://svnweb.freebsd.org/changeset/base/280880 Log: Provide workaround for a performance issue with the popcnt instruction on Intel processors. Clear spurious dependency by explicitely xoring the destination register of popcnt. Use bitcount64() instead of re-implementing SWAR locally, for processors without popcnt instruction. Reviewed by: jhb Discussed with: jilles (previous version) Sponsored by: The FreeBSD Foundation Modified: head/sys/amd64/amd64/pmap.c Modified: head/sys/amd64/amd64/pmap.c ============================================================================== --- head/sys/amd64/amd64/pmap.c Tue Mar 31 01:28:33 2015 (r280879) +++ head/sys/amd64/amd64/pmap.c Tue Mar 31 01:44:07 2015 (r280880) @@ -412,7 +412,7 @@ static caddr_t crashdumpmap; static void free_pv_chunk(struct pv_chunk *pc); static void free_pv_entry(pmap_t pmap, pv_entry_t pv); static pv_entry_t get_pv_entry(pmap_t pmap, struct rwlock **lockp); -static int popcnt_pc_map_elem(uint64_t elem); +static int popcnt_pc_map_elem_pq(uint64_t elem); static vm_page_t reclaim_pv_chunk(pmap_t locked_pmap, struct rwlock **lockp); static void reserve_pv_entries(pmap_t pmap, int needed, struct rwlock **lockp); @@ -2980,20 +2980,27 @@ retry: /* * Returns the number of one bits within the given PV chunk map element. + * + * The erratas for Intel processors state that "POPCNT Instruction May + * Take Longer to Execute Than Expected". It is believed that the + * issue is the spurious dependency on the destination register. + * Provide a hint to the register rename logic that the destination + * value is overwritten, by clearing it, as suggested in the + * optimization manual. It should be cheap for unaffected processors + * as well. + * + * Reference numbers for erratas are + * 4th Gen Core: HSD146 + * 5th Gen Core: BDM85 */ static int -popcnt_pc_map_elem(uint64_t elem) +popcnt_pc_map_elem_pq(uint64_t elem) { - int count; + u_long result; - /* - * This simple method of counting the one bits performs well because - * the given element typically contains more zero bits than one bits. - */ - count = 0; - for (; elem != 0; elem &= elem - 1) - count++; - return (count); + __asm __volatile("xorl %k0,%k0;popcntq %1,%0" + : "=&r" (result) : "rm" (elem)); + return (result); } /* @@ -3025,13 +3032,13 @@ retry: avail = 0; TAILQ_FOREACH(pc, &pmap->pm_pvchunk, pc_list) { if ((cpu_feature2 & CPUID2_POPCNT) == 0) { - free = popcnt_pc_map_elem(pc->pc_map[0]); - free += popcnt_pc_map_elem(pc->pc_map[1]); - free += popcnt_pc_map_elem(pc->pc_map[2]); + free = bitcount64(pc->pc_map[0]); + free += bitcount64(pc->pc_map[1]); + free += bitcount64(pc->pc_map[2]); } else { - free = popcntq(pc->pc_map[0]); - free += popcntq(pc->pc_map[1]); - free += popcntq(pc->pc_map[2]); + free = popcnt_pc_map_elem_pq(pc->pc_map[0]); + free += popcnt_pc_map_elem_pq(pc->pc_map[1]); + free += popcnt_pc_map_elem_pq(pc->pc_map[2]); } if (free == 0) break;