From owner-freebsd-stable Fri Oct 27 14:23:51 2000 Delivered-To: freebsd-stable@freebsd.org Received: from mail.wgate.com (unknown [38.219.83.4]) by hub.freebsd.org (Postfix) with ESMTP id 41A3A37B4CF for ; Fri, 27 Oct 2000 14:23:47 -0700 (PDT) Received: from jesup.eng.tvol.net ([10.32.2.26]) by mail.wgate.com with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2650.21) id VT2X0GV6; Fri, 27 Oct 2000 17:23:51 -0400 Reply-To: Randell Jesup To: Michel Talon Cc: "freebsd-stable@FreeBSD.ORG" Subject: Re: "Malloc type lacks magic" show-stopper solved References: <20001026231134.D9391@dragon.nuxi.com> <20001027092841.B394@lpthe.jussieu.fr> From: Randell Jesup Date: 27 Oct 2000 17:27:06 -0400 In-Reply-To: Michel Talon's message of "Fri, 27 Oct 2000 09:28:41 +0200" Message-ID: User-Agent: Gnus/5.0807 (Gnus v5.8.7) Emacs/20.7 MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Sender: owner-freebsd-stable@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG Michel Talon writes: >> > WHY!?!?!? Just what the heck do you think you're achieving with -O3 plus >> > all those things? Have you *ever* profiled anything you're compiling >> > with these options? Note that -O3 is not necessarily faster code than -O. >> > >> > This seems Yet Another "I'm macho" compiler flags instance. >> > Please correct me if I'm wrong. >Kernel code is simple with essentially no computations (except of course >special domains like crypto in kernel). So there is no much room for >optimizations. Sure there is - just not much for things like loop unrolling, etc. Admittedly this isn't as large on an x86 as on processors with more registers, but it's still true, especially for instruction scheduling for today's superscalar CPU cores (in some ways, it actually matters more for things like PII's than for Athlon/Duron). Removing frame pointers for example can save a lot of memory traffic, as can letting the compiler optimize away or merge locals. _Measuring_ the speed of a kernel is tougher, since many operations are IO's. Also, high call overheads can swap apparent differences. > Recently i have timed a scientific program to see the >performances of my brand new PC. Here is what i found: >Without any optimization the program runs 2 times slower. With >-O -O2 -O3 -Os the times are similar, the fastest was -O the slowest >was -Os. Since my PC is Duron based i have tried -march things, and have >compared on a pentium machine. Result, almost nothing, except -march pentium >was slower than -march k6 on the Duron as could be expected. All differences >are small, no more than 2s on a 30s computation. As you can see nothing that >counterbalances the risk of bugs. This depends a lot on the program. Many programs will show improvement from -O2/-O3, but not all. Adding some -fxxxx options can get more. I've seen >10% improvements. Bugs with optimizers are by far most common with code that's banging HW registers (i.e. drivers and some kernel code). I've rarely seen userland programs harmed by aggressive optimization levels. (They can make source debugging hard.) >To illustrate this, i have some years ago tested a scientific program on an >alpha machine running linux. Compiled with gcc and the best optimizations it >runned 7 times slower than compiled with Digital compiler. Conclude by >yourself. Gcc isn't optimized for numeric codes. Dec's compiler most certainly was. -- Randell Jesup, Worldgate Communications, ex-Scala, ex-Amiga OS team ('88-94) rjesup@wgate.com To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-stable" in the body of the message