From owner-freebsd-performance@FreeBSD.ORG Tue Feb 24 19:00:02 2004 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 8496C16A4CE; Tue, 24 Feb 2004 19:00:02 -0800 (PST) Received: from alcanet.com.au (mail2.alcanet.com.au [203.62.196.17]) by mx1.FreeBSD.org (Postfix) with ESMTP id 7B39A43D1F; Tue, 24 Feb 2004 18:59:59 -0800 (PST) (envelope-from peter.jeremy@alcatel.com.au) Received: from sydsmtp02.alcatel.com.au (IDENT:root@localhost.localdomain [127.0.0.1])i1P2xsMw024438; Wed, 25 Feb 2004 13:59:55 +1100 Received: from gsmx07.alcatel.com.au ([139.188.20.247]) by sydsmtp02.alcatel.com.au (Lotus Domino Release 5.0.12) with ESMTP id 2004022513595362:183578 ; Wed, 25 Feb 2004 13:59:53 +1100 Received: from gsmx07.alcatel.com.au (localhost [127.0.0.1]) i1P2xrHQ061301; Wed, 25 Feb 2004 13:59:53 +1100 (EST) (envelope-from peter.jeremy@alcatel.com.au) Received: (from jeremyp@localhost) by gsmx07.alcatel.com.au (8.12.9p2/8.12.9/Submit) id i1P2xr0s061300; Wed, 25 Feb 2004 13:59:53 +1100 (EST) (envelope-from peter.jeremy@alcatel.com.au) Date: Wed, 25 Feb 2004 13:59:53 +1100 From: Peter Jeremy To: Charles Swiger Message-ID: <20040225025953.GH10121@gsmx07.alcatel.com.au> Mail-Followup-To: Charles Swiger , freebsd-performance@freebsd.org, freebsd-alpha@freebsd.org References: <20040223192103.59ad7b69.lehmann@ans-netz.de> <20040224202652.GA13675@diogenis.ceid.upatras.gr> <5410C982-6730-11D8-8D4C-003065ABFD92@mac.com> Mime-Version: 1.0 In-Reply-To: <5410C982-6730-11D8-8D4C-003065ABFD92@mac.com> User-Agent: Mutt/1.4.2i X-MIMETrack: Itemize by SMTP Server on SYDSMTP02/AlcatelAustralia(Release 5.0.12 |February 13, 2003) at 25/02/2004 01:59:53 PM,|February 13, 2003) at 25/02/2004 01:59:55 PM, Serialize complete at 25/02/2004 01:59:55 PM Content-Type: text/plain; charset=us-ascii Content-Disposition: inline X-Mailman-Approved-At: Wed, 25 Feb 2004 00:34:29 -0800 cc: freebsd-performance@freebsd.org cc: freebsd-alpha@freebsd.org Subject: Re: Bad performance on alpha? (make buildworld) X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 25 Feb 2004 03:00:02 -0000 On 2004-Feb-24 20:17:07 -0500, Charles Swiger wrote: >On Feb 24, 2004, at 3:26 PM, Nikos Ntarmos wrote: >>IIRC the 600MHz EV56's performance wrt integer operations (such as >>compiling) is somewhere in the vicinity of a 400MHz P-II, so the >>difference you see in turn-around times when buildworld'ing isn't >>quite that big. If the operations were identical, you should see >>better times when building on the alpha. However, also take into >>account that compiling (and optimizing) for a RISC CPU, apart from >>generating larger binaries, is AFAIK supposedly more difficult than >>compiling (and optimizing) for a CISC CPU. > >I'm afraid you've got this backwards. :-) Maybe in theory, but not necessarily in practice. >The primary attributes of RISC architectures, namely lots of registers, >a relatively simple but orthagonal instruction set, and a relatively >fast clock rate / CPI ~= 1.0 / a short pipeline make it far easier for >the compiler to generate and optimize code. Alpha pipelines are only short in a relative sense - the EV5 pipeline is 7 (integer) or 9 (FP) stages and I suspect the EV56 pipeline is the same. In theory, it is 4-way superscalar but the different execution units aren't equivalent and the compiler has to understand which instructions will be allocated to which execution units in order to minimise stalls. >CISC architectures make the compilers job much harder because they tend >to require lots of register spills, they tend to have very long >pipelines which involve hazards and require a lot of instruction >reordering to avoid stalling the pipeline to often. The amount of CPU >clocks it takes per instruction (CPI) often varies on CISC as is >generally much larger than ~1.0, and sometimes varies from CPU model to >CPU model making it far more difficult to determine the "fastest" >instruction sequence. Recent iA32 implementations (basically anything more recent than a PII) are RISC cores which directly execute a subset of the iA32 instruction set with the remainder handled by microcode. You get quite respectable results by treating it as a load/store RISC architecture and relying on the L1 cache to handle the register spills in a timely fashion. The pipelines and super-scalar execution abilities are all handled in hardware. Register scoreboarding allows the implementation to have more physical registers than the programmer view supports - allowing multiple instructions to simultaneously see different values in the same visible register. The compiler has to expend a lot of effort on instruction scheduling to get decent performance out of a typical RISC architecture. Much of this is automatically handled by the hardware on an iA32 and you can get equivalent results with a much simpler compiler. Peter