Date: Sun, 31 Jul 2016 18:48:24 +0300 From: Slawa Olhovchenkov <slw@zxy.spb.ru> To: Bruce Evans <brde@optusnet.com.au> Cc: Mateusz Guzik <mjg@freebsd.org>, svn-src-head@freebsd.org, svn-src-all@freebsd.org, src-committers@freebsd.org Subject: Re: svn commit: r303583 - head/sys/amd64/amd64 Message-ID: <20160731154824.GB8192@zxy.spb.ru> In-Reply-To: <20160731152629.GA8192@zxy.spb.ru> References: <201607311134.u6VBY81j031059@repo.freebsd.org> <20160731220407.Q3033@besplex.bde.org> <20160731135129.GA22212@zxy.spb.ru> <20160801000046.X3364@besplex.bde.org> <20160731152629.GA8192@zxy.spb.ru>
next in thread | previous in thread | raw e-mail | index | archive | help
On Sun, Jul 31, 2016 at 06:26:29PM +0300, Slawa Olhovchenkov wrote: > On Mon, Aug 01, 2016 at 12:30:14AM +1000, Bruce Evans wrote: > > > On Sun, 31 Jul 2016, Slawa Olhovchenkov wrote: > > > > > On Sun, Jul 31, 2016 at 11:11:25PM +1000, Bruce Evans wrote: > > > > > >> Misalignment of this loop made it almost twice as slow on old Turion2 with > > >> slow DDR2 memory. It made no difference on Haswell. I added an extra > > >> movnti, but that makes little or no differences. 2 more movnti's wouldn't > > >> fit in a 16-byte cache line so are slower unless even more care is taken > > >> with alignment (or with less care, 4 with misalignment are not less than > > >> twice as slow as 1 with alignment). > > >> > > >> I thought that alignment and unrolling didn't matter here, because movnti > > >> has to wait for memory and almost any loop runs fast enough to keep up. > > >> The timing on my old system is something like: CPUs at 2 GHz; main memory > > >> at 4 GB/sec; movnti is only 4 bytes wide on i386 (so this problem > > >> only affects i386, at least with slow memory). So sustaining 4 GB/sec > > >> requires 1 G movnti's/sec, so the loop needs to run at 2 cycles/iteration > > >> to keep up. But when it is misaligned, it runs at 3-4 cycles/iteration. > > >> Alignment makes it take about 2, and the extra movnti is for safety and > > >> to work with faster memory. > > >> > > >> On Haswell with CPUs at 4 GHz, 2 cycles/iteration gives 8 GB/sec on > > >> i386 and 16 GB/sec on amd64 with wider movnti. IIRC, 16 GB/sec is about > > >> the main memory speed so nothing better is possible but just 1 extra > > >> movnti gives more with faster memory. This is just worse than bzero() > > > > > > What about modern system with 120 GB/sec main memory speed? > > > > Is there such a system? It would have main memory almost twice as fast > > as Haswell L2 and almost half as fast as Haswell L1. > > http://ark.intel.com/products/family/93797/Intel-Xeon-Processor-E7-v4-Family#@Server > > 102 GB/s (sorry, 120 is misprint) > > > My fastest memory actually does 20001 MB/s according to old memtest > > and that is about right according to other tests. > > Some short time I am have free 1650v4 > http://ark.intel.com/products/92994/Intel-Xeon-Processor-E5-1650-v4-15M-Cache-3_60-GHz > with up to 76.8 GB/s (by datasheet, at DDR4-2400). > With installed DDR4-2133 -- up to 68.2 GB/s (teoretical) > After short time system put into production. > > I am unable to boot UEFI Memtest86 7.0, old version (4.3.7) show 15 GB/s. Here http://wccftech.com/intel-broadwell-ep-xeon-e5-2698-v4-processor/ some benchmark show 110 GB/s write speed.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20160731154824.GB8192>