From owner-svn-src-all@FreeBSD.ORG Mon Nov 12 11:05:05 2012 Return-Path: Delivered-To: svn-src-all@FreeBSD.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 0587872D; Mon, 12 Nov 2012 11:05:05 +0000 (UTC) (envelope-from brde@optusnet.com.au) Received: from mail03.syd.optusnet.com.au (mail03.syd.optusnet.com.au [211.29.132.184]) by mx1.freebsd.org (Postfix) with ESMTP id 1A6538FC08; Mon, 12 Nov 2012 11:05:03 +0000 (UTC) Received: from c122-106-175-26.carlnfd1.nsw.optusnet.com.au (c122-106-175-26.carlnfd1.nsw.optusnet.com.au [122.106.175.26]) by mail03.syd.optusnet.com.au (8.13.1/8.13.1) with ESMTP id qACB4w6S009451 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Mon, 12 Nov 2012 22:04:59 +1100 Date: Mon, 12 Nov 2012 22:04:58 +1100 (EST) From: Bruce Evans X-X-Sender: bde@besplex.bde.org To: Bruce Evans Subject: Re: svn commit: r242835 - head/contrib/llvm/lib/Target/X86 In-Reply-To: <20121112014417.O1675@besplex.bde.org> Message-ID: <20121112213445.W1247@besplex.bde.org> References: <201211091856.qA9IuRxX035169@svn.freebsd.org> <509F2AA6.9050509@freebsd.org> <20121111214908.P938@besplex.bde.org> <509FB35F.1010801@FreeBSD.org> <20121112014417.O1675@besplex.bde.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Optus-Cloudmark-Score: 0 X-Optus-Cloudmark-Analysis: v=2.0 cv=I9g936cg c=1 sm=1 a=B4V2Pwk6IZ0A:10 a=kj9zAlcOel0A:10 a=PO7r1zJSAAAA:8 a=JzwRw_2MAAAA:8 a=Bex9lGB9SJoA:10 a=HI2N3_CjzdjmQ36RSUMA:9 a=CjuIK1q_8ugA:10 a=bxQHXO5Py4tHmhUgaywp5w==:117 Cc: svn-src-head@FreeBSD.org, svn-src-all@FreeBSD.org, src-committers@FreeBSD.org, Dimitry Andric , Nathan Whitehorn X-BeenThere: svn-src-all@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: "SVN commit messages for the entire src tree \(except for " user" and " projects" \)" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 12 Nov 2012 11:05:05 -0000 On Mon, 12 Nov 2012, Bruce Evans wrote: > On Sun, 11 Nov 2012, Dimitry Andric wrote: >> It works just fine now with clang. For the first example, I get: >> >> pushl %ebp >> movl %esp, %ebp >> andl $-32, %esp >> >> as prolog, and for the second: >> >> pushl %ebp >> movl %esp, %ebp >> andl $-16, %esp > > Good. > > The andl executes very fast. Perhaps not as fast as subl on %esp, > because subl is normal so more likely to be optimized (they nominally > have the same speeds, but %esp is magic). Unfortunately, it seems to > be impossible to both align the stack and reserve some space on it in > 1 instruction -- the andl might not reserve any. I lost kib's reply to this. He said something agreeeing about %esp being magic on Intel CPUs starting with PentiumPro. The following quick test shows no problems on Xeon 5650 (freefall) or Athlon64: @ asm(" \n\ @ .globl main \n\ @ main: \n\ @ movl $266681734,%eax \n\ @ # movl $201017002,%eax \n\ @ 1: \n\ @ call foo1 \n\ @ decl %eax \n\ @ jne 1b \n\ @ ret \n\ @ \n\ @ foo1: \n\ @ pushl %ebp \n\ @ movl %esp,%ebp \n\ @ andl $-16,%esp \n\ @ call foo2 \n\ @ movl %ebp,%esp \n\ @ popl %ebp \n\ @ ret \n\ @ \n\ @ foo2: \n\ @ pushl %ebp \n\ @ movl %esp,%ebp \n\ @ andl $-16,%esp \n\ @ call foo3 \n\ @ movl %ebp,%esp \n\ @ popl %ebp \n\ @ ret \n\ @ \n\ @ foo3: \n\ @ pushl %ebp \n\ @ movl %esp,%ebp \n\ @ andl $-16,%esp \n\ @ call foo4 \n\ @ movl %ebp,%esp \n\ @ popl %ebp \n\ @ ret \n\ @ \n\ @ foo4: \n\ @ pushl %ebp \n\ @ movl %esp,%ebp \n\ @ andl $-16,%esp \n\ @ call foo5 \n\ @ movl %ebp,%esp \n\ @ popl %ebp \n\ @ ret \n\ @ \n\ @ foo5: \n\ @ pushl %ebp \n\ @ movl %esp,%ebp \n\ @ andl $-16,%esp \n\ @ call foo6 \n\ @ movl %ebp,%esp \n\ @ popl %ebp \n\ @ ret \n\ @ \n\ @ foo6: \n\ @ pushl %ebp \n\ @ movl %esp,%ebp \n\ @ andl $-16,%esp \n\ @ call foo7 \n\ @ movl %ebp,%esp \n\ @ popl %ebp \n\ @ ret \n\ @ \n\ @ foo7: \n\ @ pushl %ebp \n\ @ movl %esp,%ebp \n\ @ andl $-16,%esp \n\ @ call foo8 \n\ @ movl %ebp,%esp \n\ @ popl %ebp \n\ @ ret \n\ @ \n\ @ foo8: \n\ @ pushl %ebp \n\ @ movl %esp,%ebp \n\ @ andl $-16,%esp \n\ @ # call foo9 \n\ @ movl %ebp,%esp \n\ @ popl %ebp \n\ @ ret \n\ @ "); Build this on an i386 system so that it is 32-bit mode. This takes 56-57 cycles/iteration on Athlon64 and 50-51 cycles/iteration on X6560. Changing the andls to subls of 16 doesn't change this. Removing all the andls and subls doesn't change this on Athlon64, but on X6560 it is 4-5 cycles faster. This shows that the gcc pessimization is largest on X6560 :-). Adding "pushl %eax; popl %eax" before the calls to foo[2-8] adds 35-36 cycles/iteration on Athlon64 but only 6-7 on X6560. I know some Athlons don't optimize pushl/popl well (maybe when they are close together or near a stack pointer change as here). Apparently Athlon64 is one such. Bruce