Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 31 Jul 2016 18:26:29 +0300
From:      Slawa Olhovchenkov <slw@zxy.spb.ru>
To:        Bruce Evans <brde@optusnet.com.au>
Cc:        Mateusz Guzik <mjg@freebsd.org>, svn-src-head@freebsd.org, svn-src-all@freebsd.org, src-committers@freebsd.org
Subject:   Re: svn commit: r303583 - head/sys/amd64/amd64
Message-ID:  <20160731152629.GA8192@zxy.spb.ru>
In-Reply-To: <20160801000046.X3364@besplex.bde.org>
References:  <201607311134.u6VBY81j031059@repo.freebsd.org> <20160731220407.Q3033@besplex.bde.org> <20160731135129.GA22212@zxy.spb.ru> <20160801000046.X3364@besplex.bde.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On Mon, Aug 01, 2016 at 12:30:14AM +1000, Bruce Evans wrote:

> On Sun, 31 Jul 2016, Slawa Olhovchenkov wrote:
> 
> > On Sun, Jul 31, 2016 at 11:11:25PM +1000, Bruce Evans wrote:
> >
> >> Misalignment of this loop made it almost twice as slow on old Turion2 with
> >> slow DDR2 memory.  It made no difference on Haswell.  I added an extra
> >> movnti, but that makes little or no differences.  2 more movnti's wouldn't
> >> fit in a 16-byte cache line so are slower unless even more care is taken
> >> with alignment (or with less care, 4 with misalignment are not less than
> >> twice as slow as 1 with alignment).
> >>
> >> I thought that alignment and unrolling didn't matter here, because movnti
> >> has to wait for memory and almost any loop runs fast enough to keep up.
> >> The timing on my old system is something like: CPUs at 2 GHz; main memory
> >> at 4 GB/sec; movnti is only 4 bytes wide on i386 (so this problem
> >> only affects i386, at least with slow memory).  So sustaining 4 GB/sec
> >> requires 1 G movnti's/sec, so the loop needs to run at 2 cycles/iteration
> >> to keep up.  But when it is misaligned, it runs at 3-4 cycles/iteration.
> >> Alignment makes it take about 2, and the extra movnti is for safety and
> >> to work with faster memory.
> >>
> >> On Haswell with CPUs at 4 GHz, 2 cycles/iteration gives 8 GB/sec on
> >> i386 and 16 GB/sec on amd64 with wider movnti.  IIRC, 16 GB/sec is about
> >> the main memory speed so nothing better is possible but just 1 extra
> >> movnti gives more with faster memory.  This is just worse than bzero()
> >
> > What about modern system with 120 GB/sec main memory speed?
> 
> Is there such a system?  It would have main memory almost twice as fast
> as Haswell L2 and almost half as fast as Haswell L1.

http://ark.intel.com/products/family/93797/Intel-Xeon-Processor-E7-v4-Family#@Server

102 GB/s (sorry, 120 is misprint)

> My fastest memory actually does 20001 MB/s according to old memtest
> and that is about right according to other tests.

Some short time I am have free 1650v4
http://ark.intel.com/products/92994/Intel-Xeon-Processor-E5-1650-v4-15M-Cache-3_60-GHz
with up to 76.8 GB/s (by datasheet, at DDR4-2400).
With installed DDR4-2133 -- up to 68.2 GB/s (teoretical)
After short time system put into production.

I am unable to boot UEFI Memtest86 7.0, old version (4.3.7) show 15 GB/s.

# ramspeed -b 18 -p 4
RAMspeed/SMP (FreeBSD) v3.5.0 by Rhett M. Hollander and Paul V.
Bolotoff, 2002-09

8Gb per pass mode, 4 processes

SSE (nt)  Copy:      54176.91 MB/s  [NTA prefetch]
SSE (nt)  Scale:     54241.98 MB/s  [NTA prefetch]
SSE (nt)  Add:       48945.60 MB/s  [T0 prefetch]
SSE (nt)  Triad:     50102.80 MB/s  [T0 prefetch]
---
SSE (nt)  AVERAGE:   51866.82 MB/s

# ramspeed -b 16 -p 4
RAMspeed/SMP (FreeBSD) v3.5.0 by Rhett M. Hollander and Paul V.
Bolotoff, 2002-09

8Gb per pass mode, 4 processes

SSE & WRITING (nt)        1 Kb block: 55913.18 MB/s
SSE & WRITING (nt)        2 Kb block: 60819.02 MB/s
SSE & WRITING (nt)        4 Kb block: 58662.37 MB/s
SSE & WRITING (nt)        8 Kb block: 57165.14 MB/s
SSE & WRITING (nt)       16 Kb block: 56310.22 MB/s
SSE & WRITING (nt)       32 Kb block: 56407.22 MB/s
SSE & WRITING (nt)       64 Kb block: 58200.44 MB/s
SSE & WRITING (nt)      128 Kb block: 59213.49 MB/s
SSE & WRITING (nt)      256 Kb block: 59047.57 MB/s
SSE & WRITING (nt)      512 Kb block: 59158.01 MB/s
SSE & WRITING (nt)     1024 Kb block: 59140.03 MB/s
SSE & WRITING (nt)     2048 Kb block: 59165.49 MB/s
SSE & WRITING (nt)     4096 Kb block: 59714.68 MB/s
SSE & WRITING (nt)     8192 Kb block: 59926.68 MB/s
SSE & WRITING (nt)    16384 Kb block: 59100.03 MB/s
SSE & WRITING (nt)    32768 Kb block: 58268.52 MB/s

# ramspeed -b 16 -p 2
RAMspeed/SMP (FreeBSD) v3.5.0 by Rhett M. Hollander and Paul V.
Bolotoff, 2002-09

8Gb per pass mode, 2 processes

SSE & WRITING (nt)        1 Kb block: 32131.03 MB/s
SSE & WRITING (nt)        2 Kb block: 41851.23 MB/s
SSE & WRITING (nt)        4 Kb block: 41848.02 MB/s
SSE & WRITING (nt)        8 Kb block: 41640.80 MB/s
SSE & WRITING (nt)       16 Kb block: 41640.60 MB/s
SSE & WRITING (nt)       32 Kb block: 41639.89 MB/s
SSE & WRITING (nt)       64 Kb block: 41849.65 MB/s
SSE & WRITING (nt)      128 Kb block: 41848.74 MB/s
SSE & WRITING (nt)      256 Kb block: 41847.87 MB/s
SSE & WRITING (nt)      512 Kb block: 41846.14 MB/s
SSE & WRITING (nt)     1024 Kb block: 41835.69 MB/s
SSE & WRITING (nt)     2048 Kb block: 41815.94 MB/s
SSE & WRITING (nt)     4096 Kb block: 41717.39 MB/s
SSE & WRITING (nt)     8192 Kb block: 41575.85 MB/s
SSE & WRITING (nt)    16384 Kb block: 41295.03 MB/s
SSE & WRITING (nt)    32768 Kb block: 40735.83 MB/s



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20160731152629.GA8192>