Date: Thu, 26 Feb 2004 06:43:24 +1100 From: Peter Jeremy <peter.jeremy@alcatel.com.au> To: Petri Helenius <pete@he.iki.fi> Cc: freebsd-alpha@freebsd.org Subject: Re: Bad performance on alpha? (make buildworld) Message-ID: <20040225194324.GI10121@gsmx07.alcatel.com.au> In-Reply-To: <403C6A24.80804@he.iki.fi> References: <20040223192103.59ad7b69.lehmann@ans-netz.de> <20040224202652.GA13675@diogenis.ceid.upatras.gr> <5410C982-6730-11D8-8D4C-003065ABFD92@mac.com> <20040225025953.GH10121@gsmx07.alcatel.com.au> <403C6A24.80804@he.iki.fi>
next in thread | previous in thread | raw e-mail | index | archive | help
On 2004-Feb-25 11:25:56 +0200, Petri Helenius <pete@he.iki.fi> wrote: >This probably invites the question, what, if anything people like me who >are interested in getting the maximum performance out of any hardware >our things run on (maybe with the exception of the low-MHz embedded >stuff :-), is there any good tutorials/books on the subject what kind of >things to avoid when looking for optimal performance. The tightest loops >mostly do counter rolling, comparisons and pattern matching and we have >good mileage on getting performance gains by minimizing writing to >memory when there are other options like arithmetic on the fly. Keep in mind several over-riding rules: 1) Make sure the code is correct before worrying about performance 2) Measure the performance and only worry about the slow bits 3) A better algorithm will virtually always give the biggest performance gain I can't suggest any general books off-hand (I'm sure someone else in -performance will know). You will need the data sheet or programmers manual for the specific CPU you are aiming for, as well as the relevant architecture manual (Intel publish a 3-volume iA32 architecture manual that you can download from the web, the Alpha AXP architecture manual is also available online from the HP website). The AXP manual includes two chapters describing general techniques for AXP coding. The individual CPU datasheets describe the number and capabilities of execution units and how the instruction scheduling works, as well as a matrix of instruction timings (how many clocks you need to leave between a producer and a consumer instruction to avoid a bubble). These numbers and definitions need to be mapped into the scheduling tables for your compiler. Keep in mind that both the iA32 and AXP CPUs have embedded performance counters. These will be very useful to monitor low-level details like pipeline stalls, branch mis-predictions, cache misses etc. -- Peter Jeremy
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20040225194324.GI10121>