From owner-freebsd-stable@FreeBSD.ORG Thu Sep 14 16:45:46 2006 Return-Path: X-Original-To: freebsd-stable@freebsd.org Delivered-To: freebsd-stable@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 7B8EF16A403 for ; Thu, 14 Sep 2006 16:45:46 +0000 (UTC) (envelope-from cswiger@mac.com) Received: from smtpout.mac.com (smtpout.mac.com [17.250.248.182]) by mx1.FreeBSD.org (Postfix) with ESMTP id 10F9F43D6B for ; Thu, 14 Sep 2006 16:45:46 +0000 (GMT) (envelope-from cswiger@mac.com) Received: from mac.com (smtpin08-en2 [10.13.10.153]) by smtpout.mac.com (Xserve/8.12.11/smtpout12/MantshX 4.0) with ESMTP id k8EGjj2s017395; Thu, 14 Sep 2006 09:45:45 -0700 (PDT) Received: from [17.214.13.96] (a17-214-13-96.apple.com [17.214.13.96]) (authenticated bits=0) by mac.com (Xserve/smtpin08/MantshX 4.0) with ESMTP id k8EGjho9014519; Thu, 14 Sep 2006 09:45:44 -0700 (PDT) In-Reply-To: <20060914044241.GA92358@thought.org> References: <200609130905.k8D95idk062789@lurza.secnetix.de> <4507CC9B.60704@sun-fish.com> <20060913234934.GA92067@thought.org> <0B8BF03E-8F4A-4279-850B-2EA7FF5E1B89@mac.com> <20060914044241.GA92358@thought.org> Mime-Version: 1.0 (Apple Message framework v752.2) Content-Type: text/plain; charset=US-ASCII; delsp=yes; format=flowed Message-Id: <549CD0AC-2CED-4112-B708-5F4FB1DA69D2@mac.com> Content-Transfer-Encoding: 7bit From: Chuck Swiger Date: Thu, 14 Sep 2006 09:45:42 -0700 To: Gary Kline X-Mailer: Apple Mail (2.752.2) X-Brightmail-Tracker: AAAAAQAAA+k= X-Language-Identified: TRUE Cc: freebsd-stable Subject: Re: optimization levels for 6-STABLE build{kernel,world} X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 14 Sep 2006 16:45:46 -0000 On Sep 13, 2006, at 9:42 PM, Gary Kline wrote: >> -funroll-loops is as likely to decrease performance for a particular >> program as it is to help. > > Isn't the compiler intelligent enough to have a reasonable > limit, N, of the loops it will unroll to ensure a faster runtime? > Something much less than 1000, say; possibly less than 100. Of course; in fact, N is probably closer to 4 or 8 than it is to 100. > At least, if the initializiation and end-loop code *plus* the > loop code itself were too large for the cache, my thought is that > gcc would back out. Unless you've indicated that the compiler should target a specific CPU architecture, there is no way for it to know whether the size of the L1 cache on the machine doing the compile is the same as, or even similar to the size of the system where the code will run. > I may be giving RMS too much credit; but > if memory serves, thed compiler was GNU's first project. And > Stallman was into GOFAI, &c, for better/worse.[1] Anyway, for now > I'll comment out the unroll-loops arg. cd /usr/src/contrib/gcc && grep Stallman ChangeLog ...returns no results. A tool I wrote suggests: % histogram.py -F' ' -f 2,3 -p @ -c 10 ChangeLog 61 Kazu Hirata 51 Eric Botcazou 48 Jan Hubicka 39 Richard Sandiford 37 Alan Modra 30 Richard Henderson 29 Joseph S. Myers 27 Jakub Jelinek 25 Zack Weinberg 22 Mark Mitchell 20 John David Anglin 20 Ulrich Weigand 17 Rainer Orth 16 Kelley Cook 16 Roger Sayle 13 David Edelsohn 12 Aldy Hernandez 11 Stephane Carrez 11 Ian Lance Taylor 10 Andrew Pinski 10 Kaz Kojima 10 James E Wilson >> A safe optimizer must assume that an arbitrary assignment via a >> pointer dereference can change any value in memory, which means that >> you have to spill and reload any data being cached in CPU registers >> around the use of the pointer, except for const's, variables declared >> as "register", and possibly function arguments being passed via >> registers and not on the stack (cf "register windows" on the SPARC >> hardware, or HP/PA's calling conventions). > > Well, I'd added the no-strict-aliasing flag to make.conf! > Pointers give me indigestion ... even after all these years. > Thanks for your insights. And the URL. You're welcome. > gary > > [1]. Seems to me that "good old-fashioned AI" techniques would work in > something like a compiler where you probblyhave a good idea of > most heuristics. -gk Of course. The compiler enables those optimizations with -O or -O2 which are almost certain to result in beneficial improvements to performance and code size, most of the time. Potential optimizations which are not helpful on average are not enabled by default, until the situations where they are known to be useful can be identified by the compiler at compile-time. Using non-default optimization options isn't like discovering buried treasure that nobody else was aware of; the options aren't enabled by default for good reason(s), usually because the tradeoffs they make aren't helpful in general (yet), or because their usage has known bugs which result in faulty executables being produced. -- -Chuck