From owner-freebsd-alpha@FreeBSD.ORG Wed Feb 25 12:36:01 2004 Return-Path: Delivered-To: freebsd-alpha@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id F3F4216A4CE; Wed, 25 Feb 2004 12:36:00 -0800 (PST) Received: from out004.verizon.net (out004pub.verizon.net [206.46.170.142]) by mx1.FreeBSD.org (Postfix) with ESMTP id A83D443D1D; Wed, 25 Feb 2004 12:36:00 -0800 (PST) (envelope-from cswiger@mac.com) Received: from mac.com ([68.161.120.219]) by out004.verizon.net (InterMail vM.5.01.06.06 201-253-122-130-106-20030910) with ESMTP id <20040225203559.RHNB8186.out004.verizon.net@mac.com>; Wed, 25 Feb 2004 14:35:59 -0600 Message-ID: <403D072C.7090207@mac.com> Date: Wed, 25 Feb 2004 15:35:56 -0500 From: Chuck Swiger Organization: The Courts of Chaos User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.6) Gecko/20040113 X-Accept-Language: en-us, en MIME-Version: 1.0 To: freebsd-performance@freebsd.org, freebsd-alpha@freebsd.org References: <20040223192103.59ad7b69.lehmann@ans-netz.de> <20040224202652.GA13675@diogenis.ceid.upatras.gr> <5410C982-6730-11D8-8D4C-003065ABFD92@mac.com> <20040225025953.GH10121@gsmx07.alcatel.com.au> <403C3053.5030204@mac.com> <20040225193053.GL7567@dragon.nuxi.com> In-Reply-To: <20040225193053.GL7567@dragon.nuxi.com> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-Authentication-Info: Submitted using SMTP AUTH at out004.verizon.net from [68.161.120.219] at Wed, 25 Feb 2004 14:35:59 -0600 cc: obrien@FreeBSD.org Subject: Re: Bad performance on alpha? (make buildworld) X-BeenThere: freebsd-alpha@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Porting FreeBSD to the Alpha List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 25 Feb 2004 20:36:01 -0000 David O'Brien wrote: > On Wed, Feb 25, 2004 at 12:19:15AM -0500, Chuck Swiger wrote: >>>Maybe in theory, but not necessarily in practice. >> >>It's been a few years since I'd written a compiler, but my viewpoint isn't >>based entirely on theory. [ ... ] >> Your technical description is accurate, but the points you are making here >> seem to support my argument, rather than contradict what I said. :-) > > You're assuming you're writing a compiler targeting _1_ specific > architecture. No, sir, I certainly do not make such an assumption. Most optimization techniques are architecture-independant: liveness analysis, CSE, dead code elimination, moving invariants out of loops, branch threading, algorithmic identities and strength-reduction. These optimizations are most commonly done working with the 3-argument intermediate code that portable compilers (PCC, GCC) typically utilize before target platform code generation is actually performed. There are a few additional optimizations which are architecture specific, such as instruction scheduling and peephole/template optimizations, but these optimizations generally make much less difference to performance than the architecture-independant optimizations mentioned above. Although on some platforms, they can make enough difference that a second pass at CSE or instruction rescheduling against the target assembly code can be worth doing. > It doesn't matter what is possible, what matters is what > GCC does. Please go analysis GCC and report the deficiencies. I > personally would love to know what they are, and how to make GCC do > better on non-x86 platforms. I agree that what GCC does matters, not theories. I don't have access to Alpha hardware, which is a barrier although not an insuperable one. I'd do better considering SPARC or PPC hardware, which I actually have available to me. Still, I won't use this as an excuse: A quick look suggests that Alpha code generation is deficient dealing with unsigned integers because the architecture uses a "sign extended" format to store and convert 32-bit unsigned ints (aka "long words") into the (64-bit, aka "quad-word") registers. Dealing with unsigned ints smaller than 32-bits very probably is also slow because the Alpha requires operand-size byte-alignment for all memory access. [ "The Alpha does not directly support byte-level operations such as transferring single bytes between memory and registers. In principal, we could use the instructions already presented to realize bytelevel manipulations, but a large amount of shifting and masking would be required. For example, consider the C operation *dest = *src, where both dest and src are of type (char *). This operation must read the single byte pointed to by src and update the single byte pointed to by dest. Without special byte manipulation instructions, this simple operation requires 17 Alpha instructions!" ] Supposedly, the ldq_u and stq_u instructions are the right way to handle byte-level memory access, and it would be worth looking at how well GCC utilizes these opcodes dealing with chars and shorts. Some of these issues cannot be addressed by changes to the compiler: I suspect that FreeBSD's derivation and focus on the x86 architecture means it uses a lot of int8 or int16 values which are fast on Intel hardware, whereas using int32 or int64 representations would actually prove much faster on the Alpha than using smaller-sized quantities. -- -Chuck