Date: Sun, 09 Feb 2003 18:12:13 -0800 From: Terry Lambert <tlambert2@mindspring.com> To: Marcin Dalecki <mdcki@gmx.net> Cc: David Schultz <dschultz@uclink.berkeley.edu>, Adrian Chadd <adrian@freebsd.org>, Ray Kohler <ataraxia@cox.net>, freebsd-current@freebsd.org Subject: Re: Compiling with high optimization? Message-ID: <3E470A7D.D7D1EAC3@mindspring.com> References: <20030208173756.GA56030@arkadia.nv.cox.net> <20030208232724.GA20435@HAL9000.homeunix.com> <3E459BF3.BB3FC381@mindspring.com> <20030209002542.GA20812@HAL9000.homeunix.com> <20030209141006.GB33928@skywalker.creative.net.au> <20030209150120.GA2263@HAL9000.homeunix.com> <3E4671E6.8090000@gmx.net>
next in thread | previous in thread | raw e-mail | index | archive | help
Marcin Dalecki wrote: > David Schultz wrote: > > Strangely, gcc in FreeBSD 5.0 actually generates *slower* code > > when compiling for more recent architectures than when compiling > > for a 386. I don't know whether that is a bug in gcc or whether > > gcc is using some fancy feature like SSE that the kernel handles > > poorly on context switches. I think there was some discussion on > > the lists about it earlier. > > The reason is that the optimization done by GCC are ill balanced. > All the scheduling of instractions and what a not - which would be > fine on a micro scope level is causing so much higher pressure > on the CPUs caches that the code is actually loosing. That's not actually it, though there *are* instruction scheduling issues that will impact the Pentium 4 code generation, and other Intel processor-specific code generation, mostly L1 caches have been, relative to the size of main memory, been getting much, much larger. Intel has written an article on "How to generate optimized code for Pentium 4 processors". It has been posted to these lists a couple of times already, and you can search it out on Intel's site, if you care to. For the Pentium 4, the article identifies a shopping list of things that you are "not supposed to do", which GCC does. Actually, cache pressure is the least of them. If FreeBSD would cache line align locks and mutexes, and not put them in the same cache lines (very hard to do, for some structures), most of the so-called "cache pressure" could be made to "go away". IBM recently posted an article comparing performance numbers for Linux with and without this change. Realize, though, that FreeBSD and Linux have somewhat different philosophies when it comes to SMP, even if that's hard to tell from the lack of detailed implementation plans being published by either camp. If the ability to optimize code for the Pentium 4 concerns you, then you should become a contributor to the GCC project, which means you need to execute a notarized assignment of rights statement with the FSF before they will accept patches from you, and once that's done, you can start going down Intel's optimization laundry list, sending patches to the GCC folks. -- Terry To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-current" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?3E470A7D.D7D1EAC3>