From owner-svn-src-head@freebsd.org Sun Jul 31 14:30:21 2016 Return-Path: Delivered-To: svn-src-head@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 8D32BBA9B2E; Sun, 31 Jul 2016 14:30:21 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail104.syd.optusnet.com.au (mail104.syd.optusnet.com.au [211.29.132.246]) by mx1.freebsd.org (Postfix) with ESMTP id F1ED316BB; Sun, 31 Jul 2016 14:30:20 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from c122-106-149-109.carlnfd1.nsw.optusnet.com.au (c122-106-149-109.carlnfd1.nsw.optusnet.com.au [122.106.149.109]) by mail104.syd.optusnet.com.au (Postfix) with ESMTPS id E9405423882; Mon, 1 Aug 2016 00:30:14 +1000 (AEST) Date: Mon, 1 Aug 2016 00:30:14 +1000 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Slawa Olhovchenkov cc: Bruce Evans , Mateusz Guzik , svn-src-head@freebsd.org, svn-src-all@freebsd.org, src-committers@freebsd.org Subject: Re: svn commit: r303583 - head/sys/amd64/amd64 In-Reply-To: <20160731135129.GA22212@zxy.spb.ru> Message-ID: <20160801000046.X3364@besplex.bde.org> References: <201607311134.u6VBY81j031059@repo.freebsd.org> <20160731220407.Q3033@besplex.bde.org> <20160731135129.GA22212@zxy.spb.ru> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.1 cv=EfU1O6SC c=1 sm=1 tr=0 a=R/f3m204ZbWUO/0rwPSMPw==:117 a=L9H7d07YOLsA:10 a=9cW_t1CCXrUA:10 a=s5jvgZ67dGcA:10 a=kj9zAlcOel0A:10 a=NXl1eo41jXg0PfEGVNIA:9 a=CjuIK1q_8ugA:10 X-BeenThere: svn-src-head@freebsd.org X-Mailman-Version: 2.1.22 Precedence: list List-Id: SVN commit messages for the src tree for head/-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 31 Jul 2016 14:30:21 -0000 On Sun, 31 Jul 2016, Slawa Olhovchenkov wrote: > On Sun, Jul 31, 2016 at 11:11:25PM +1000, Bruce Evans wrote: > >> Misalignment of this loop made it almost twice as slow on old Turion2 with >> slow DDR2 memory. It made no difference on Haswell. I added an extra >> movnti, but that makes little or no differences. 2 more movnti's wouldn't >> fit in a 16-byte cache line so are slower unless even more care is taken >> with alignment (or with less care, 4 with misalignment are not less than >> twice as slow as 1 with alignment). >> >> I thought that alignment and unrolling didn't matter here, because movnti >> has to wait for memory and almost any loop runs fast enough to keep up. >> The timing on my old system is something like: CPUs at 2 GHz; main memory >> at 4 GB/sec; movnti is only 4 bytes wide on i386 (so this problem >> only affects i386, at least with slow memory). So sustaining 4 GB/sec >> requires 1 G movnti's/sec, so the loop needs to run at 2 cycles/iteration >> to keep up. But when it is misaligned, it runs at 3-4 cycles/iteration. >> Alignment makes it take about 2, and the extra movnti is for safety and >> to work with faster memory. >> >> On Haswell with CPUs at 4 GHz, 2 cycles/iteration gives 8 GB/sec on >> i386 and 16 GB/sec on amd64 with wider movnti. IIRC, 16 GB/sec is about >> the main memory speed so nothing better is possible but just 1 extra >> movnti gives more with faster memory. This is just worse than bzero() > > What about modern system with 120 GB/sec main memory speed? Is there such a system? It would have main memory almost twice as fast as Haswell L2 and almost half as fast as Haswell L1. My fastest memory actually does 20001 MB/s according to old memtest and that is about right according to other tests. Bruce