From owner-svn-src-head@freebsd.org  Sun Jul 31 13:51:38 2016
Return-Path: <owner-svn-src-head@freebsd.org>
Delivered-To: svn-src-head@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 8D2B8BA92C1;
 Sun, 31 Jul 2016 13:51:38 +0000 (UTC) (envelope-from slw@zxy.spb.ru)
Received: from zxy.spb.ru (zxy.spb.ru [195.70.199.98])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id 526371CE5;
 Sun, 31 Jul 2016 13:51:38 +0000 (UTC) (envelope-from slw@zxy.spb.ru)
Received: from slw by zxy.spb.ru with local (Exim 4.86 (FreeBSD))
 (envelope-from <slw@zxy.spb.ru>)
 id 1bTr9N-000FUI-A0; Sun, 31 Jul 2016 16:51:29 +0300
Date: Sun, 31 Jul 2016 16:51:29 +0300
From: Slawa Olhovchenkov <slw@zxy.spb.ru>
To: Bruce Evans <brde@optusnet.com.au>
Cc: Mateusz Guzik <mjg@freebsd.org>, svn-src-head@freebsd.org,
 svn-src-all@freebsd.org, src-committers@freebsd.org
Subject: Re: svn commit: r303583 - head/sys/amd64/amd64
Message-ID: <20160731135129.GA22212@zxy.spb.ru>
References: <201607311134.u6VBY81j031059@repo.freebsd.org>
 <20160731220407.Q3033@besplex.bde.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20160731220407.Q3033@besplex.bde.org>
User-Agent: Mutt/1.5.24 (2015-08-30)
X-SA-Exim-Connect-IP: <locally generated>
X-SA-Exim-Mail-From: slw@zxy.spb.ru
X-SA-Exim-Scanned: No (on zxy.spb.ru); SAEximRunCond expanded to false
X-BeenThere: svn-src-head@freebsd.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: SVN commit messages for the src tree for head/-current
 <svn-src-head.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/svn-src-head>,
 <mailto:svn-src-head-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/svn-src-head/>
List-Post: <mailto:svn-src-head@freebsd.org>
List-Help: <mailto:svn-src-head-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/svn-src-head>,
 <mailto:svn-src-head-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Sun, 31 Jul 2016 13:51:38 -0000

On Sun, Jul 31, 2016 at 11:11:25PM +1000, Bruce Evans wrote:

> Misalignment of this loop made it almost twice as slow on old Turion2 with
> slow DDR2 memory.  It made no difference on Haswell.  I added an extra
> movnti, but that makes little or no differences.  2 more movnti's wouldn't
> fit in a 16-byte cache line so are slower unless even more care is taken
> with alignment (or with less care, 4 with misalignment are not less than
> twice as slow as 1 with alignment).
> 
> I thought that alignment and unrolling didn't matter here, because movnti
> has to wait for memory and almost any loop runs fast enough to keep up.
> The timing on my old system is something like: CPUs at 2 GHz; main memory
> at 4 GB/sec; movnti is only 4 bytes wide on i386 (so this problem
> only affects i386, at least with slow memory).  So sustaining 4 GB/sec
> requires 1 G movnti's/sec, so the loop needs to run at 2 cycles/iteration
> to keep up.  But when it is misaligned, it runs at 3-4 cycles/iteration.
> Alignment makes it take about 2, and the extra movnti is for safety and
> to work with faster memory.
> 
> On Haswell with CPUs at 4 GHz, 2 cycles/iteration gives 8 GB/sec on
> i386 and 16 GB/sec on amd64 with wider movnti.  IIRC, 16 GB/sec is about
> the main memory speed so nothing better is possible but just 1 extra
> movnti gives more with faster memory.  This is just worse than bzero()

What about modern system with 120 GB/sec main memory speed?