From owner-svn-src-head@FreeBSD.ORG Sun Apr 5 06:55:33 2015 Return-Path: Delivered-To: svn-src-head@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id CF44B504; Sun, 5 Apr 2015 06:55:33 +0000 (UTC) Received: from mail106.syd.optusnet.com.au (mail106.syd.optusnet.com.au [211.29.132.42]) by mx1.freebsd.org (Postfix) with ESMTP id D3DD5278; Sun, 5 Apr 2015 06:55:32 +0000 (UTC) Received: from c211-30-166-197.carlnfd1.nsw.optusnet.com.au (c211-30-166-197.carlnfd1.nsw.optusnet.com.au [211.30.166.197]) by mail106.syd.optusnet.com.au (Postfix) with ESMTPS id 178E93C375B; Sun, 5 Apr 2015 16:55:24 +1000 (AEST) Date: Sun, 5 Apr 2015 16:55:23 +1000 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Eitan Adler Subject: Re: svn commit: r281103 - head/sys/amd64/amd64 In-Reply-To: <201504050518.t355IFVJ001786@svn.freebsd.org> Message-ID: <20150405163305.A2515@besplex.bde.org> References: <201504050518.t355IFVJ001786@svn.freebsd.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.1 cv=A5NVYcmG c=1 sm=1 tr=0 a=KA6XNC2GZCFrdESI5ZmdjQ==:117 a=PO7r1zJSAAAA:8 a=kj9zAlcOel0A:10 a=JzwRw_2MAAAA:8 a=bz_mcmIv9LyV8_vcxFEA:9 a=CjuIK1q_8ugA:10 Cc: svn-src-head@freebsd.org, svn-src-all@freebsd.org, src-committers@freebsd.org X-BeenThere: svn-src-head@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: SVN commit messages for the src tree for head/-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 05 Apr 2015 06:55:33 -0000 On Sun, 5 Apr 2015, Eitan Adler wrote: > Log: > adrian asked me to revert and get more testing > > Modified: > head/sys/amd64/amd64/support.S > > Modified: head/sys/amd64/amd64/support.S > ============================================================================== > --- head/sys/amd64/amd64/support.S Sun Apr 5 05:14:20 2015 (r281102) > +++ head/sys/amd64/amd64/support.S Sun Apr 5 05:18:14 2015 (r281103) > @@ -73,11 +73,7 @@ ENTRY(pagezero) > movnti %rax,8(%rdi,%rdx) > movnti %rax,16(%rdi,%rdx) > movnti %rax,24(%rdi,%rdx) > - movnti %rax,32(%rdi,%rdx) > - movnti %rax,40(%rdi,%rdx) > - movnti %rax,48(%rdi,%rdx) > - movnti %rax,56(%rdi,%rdx) > - addq $64,%rdx > + addq $32,%rdx > jne 1b > sfence > POP_FRAME_POINTER My tests show that such changes make no difference unless you throttle the CPU to make the memory fast in comparision. Counting cycles shows that no unrolling is useful on a 4GHz CPU with memory slower than 16GB/sec. But there may be complications for write buffering. 8 bytes at a time is too small, so although movnti bypasses the caches it must go through write buffers to combine writes. Then writing in groups of the same size as the write buffer may be best. Tests on ref11-amd64 now show no significant difference between 4-way, 1-way, 8-way and 2-way unrolling. 4-way seems to be insignificantly slowest and 2-way and 8-way insignificantly equally fastest. "Insignificantly" means less than 2% on a micro-benchmark but there is some variance whch I wasn't careful to determine. Old tests show that if you improve the speed of pagecopy and pagezero by a lot more than 1% like I do for i386-with-no-SSE2 using movntps, then you get insignificant speedups for makeworld. (-current uses movnti for both pagecopy and pagezero on amd64, but on i386 it only uses movnti for sse2_pagezero.) movnti is used to bypass the cache. It is not clear that this is best. Bypassing the cache for other things seemed to give just large complexity for a small loss. I got best results for makeworld from an old version of FreeBSD that did page zeroing in idle context. That is unsupported/broken/done differently now. With page zeroing in idle context, bypassing the cache is clearly right, and the speed of pagezero doesn't matter much iff it is executed in idle context, and it should run slower if necessary to bypass the cache. It is probably wrong to bypass the cache for zeroing on demand. Then at least the bytes that caused the page to be demanded are sure to be used soon. Bruce