Date: Thu, 27 Feb 2003 23:58:32 +0000 From: Bruce Cran <bruce@cran.org.uk> To: Nuno Teixeira <nunotex@aeiou.pt> Cc: current@freebsd.org Subject: Re: -O2 considered harmful Message-ID: <20030227235832.GA99310@fourtytwo.brucec.backnet> In-Reply-To: <20030227214913.GA3517@gw.tex.bogus> References: <20030227025155.61529.qmail@web40310.mail.yahoo.com> <20030227083800.GA96372@fourtytwo.brucec.backnet> <20030227214913.GA3517@gw.tex.bogus>
next in thread | previous in thread | raw e-mail | index | archive | help
On Thu, Feb 27, 2003 at 09:49:13PM +0000, Nuno Teixeira wrote: > On Thu, Feb 27, 2003 at 08:38:00AM +0000, Bruce Cran wrote: > > I'm afraid you're wrong - the V2SI datatype and MMX functions automatically > > become available after -march=pentium2, while with other processor types > > you've got to explicitly add -mmmx. -msse is presumed with -march=pentium3 > > and up. It's far from absurd to use mmx for everyday applications - sure, > > only a few applications may take advantage of it, but I've seen code which > > runs 40x faster when compiled for athlon-xp than for i386, and I would guess > > that a lot of that is because of clever use of sse and mmx. That wasn't > > an audio/video program, it was the libgmp arbitrary precision maths > > package. Also, I'm sure > > most people wouldn't say no to 50% more processing speed for free! > > So, if you've got a pentium, k6 or pentiumpro which supports MMX, you _do_ > > need to explicitly add -mmmx, but for other processors it's implied. > > > I searched gcc docs and didn't found info for what you say here. I'm > seeing a lots of people using e.g. athlon-xp with -mmmx and -m3dnow > included. I'm confused about if this optimizations are implied or not by > processores that supports it. > I, too, use -mmmx -msse -m3dnow with -march=athlon-xp. I do it simply because I don't trust gcc enough to do it for me - people have shown in the past that -O2 and -O3 don't activate all the optimizations which the docs claim they should, which is why you see people adding crazy stuff like -funroll-loops -fomit-frame-pointer -fschedule-insns2 -fgcse ... The surest way to find out about at which point gcc enables vector extensions is, if you've got access to a suitable computer, compile the following: typedef int v4sf __attribute__ ((mode(V4SF))); int main() { v4sf a = {1,2,3,4}; v4sf b = {5,6,7,8}; v4sf c = __builtin_ia32_addps(a,b); return 0; } This will only compile when gcc has enabled sse instruction support. I've found that this happens when you use -msse on it's own, even with -march=pentium, and when you use -march=pentium3, -march=pentium4, -march=athlon-xp etc without any extra -msse. In addition, when compiling mmx code, -m3dnow implies -mmmx, which makes sense since 3dnow is just an extension of mmx. Of course what many people don't realise is that gcc, unlike icc, will not produce any vector instructions unless either the -mfpunit=sse is enabled to use sse for all floating point math, or vector instructions are explicitly coded for, as above. So for most software, adding the extra flags shouldn't affect it in the slightest, but for a few applications, it will detect the vector unit and use it, resulting in a sometimes significant performance gain. You should notice a fairly large increase in performance when using the sse unit, because unlike mmx and 3dnow it is a seperate functional unit and so was designed to be fast, instead of being crippled with backward compatability with the 387. Indeed, with sse2, Intel seem to have finally gained a very decent vector processing unit which can compete with similar processors such as the G4 with its AltiVec. Bruce Cran To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-current" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20030227235832.GA99310>