From owner-freebsd-performance@FreeBSD.ORG Wed Feb 25 01:25:53 2004 Return-Path: Delivered-To: freebsd-performance@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 30A8616A4CE; Wed, 25 Feb 2004 01:25:53 -0800 (PST) Received: from rms04.rommon.net (rms04.rommon.net [212.54.2.140]) by mx1.FreeBSD.org (Postfix) with ESMTP id 20CF743D2D; Wed, 25 Feb 2004 01:25:52 -0800 (PST) (envelope-from pete@he.iki.fi) Received: from he.iki.fi (i2-149.rommon.fi [195.163.185.149]) by rms04.rommon.net (8.12.9p1/8.12.9) with ESMTP id i1P9PccM023243; Wed, 25 Feb 2004 11:25:38 +0200 (EET) (envelope-from pete@he.iki.fi) Message-ID: <403C6A24.80804@he.iki.fi> Date: Wed, 25 Feb 2004 11:25:56 +0200 From: Petri Helenius User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.6) Gecko/20040113 X-Accept-Language: en-us, en MIME-Version: 1.0 To: Peter Jeremy References: <20040223192103.59ad7b69.lehmann@ans-netz.de> <20040224202652.GA13675@diogenis.ceid.upatras.gr> <5410C982-6730-11D8-8D4C-003065ABFD92@mac.com> <20040225025953.GH10121@gsmx07.alcatel.com.au> In-Reply-To: <20040225025953.GH10121@gsmx07.alcatel.com.au> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit cc: freebsd-performance@freebsd.org cc: Charles Swiger cc: freebsd-alpha@freebsd.org Subject: Re: Bad performance on alpha? (make buildworld) X-BeenThere: freebsd-performance@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Performance/tuning List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 25 Feb 2004 09:25:53 -0000 Peter Jeremy wrote: >Recent iA32 implementations (basically anything more recent than a >PII) are RISC cores which directly execute a subset of the iA32 >instruction set with the remainder handled by microcode. You get >quite respectable results by treating it as a load/store RISC >architecture and relying on the L1 cache to handle the register spills > > This probably invites the question, what, if anything people like me who are interested in getting the maximum performance out of any hardware our things run on (maybe with the exception of the low-MHz embedded stuff :-), is there any good tutorials/books on the subject what kind of things to avoid when looking for optimal performance. The tightest loops mostly do counter rolling, comparisons and pattern matching and we have good mileage on getting performance gains by minimizing writing to memory when there are other options like arithmetic on the fly. One specific question that also comes to mind is if there is benefit on the more modern, SSE enabled code, to excersise floating point in balance with 64bit long long integers or does that gain performance only if the code is compiled without SSE? Pete