Date: Thu, 13 Feb 1997 23:48:07 -0500 From: "David S. Miller" <davem@jenolan.rutgers.edu> To: hamby@aris.jpl.nasa.gov Cc: asami@vader.cs.berkeley.edu, jmb@freefall.freebsd.org, hackers@FreeBSD.ORG Subject: Re: Sun Workshop compiler vs. GCC? Message-ID: <199702140448.XAA18687@jenolan.caipgeneral> In-Reply-To: <Pine.GSO.3.95.970213201150.10714A-100000@aris> (message from Jake Hamby on Thu, 13 Feb 1997 20:17:38 -0800 (PST))
next in thread | previous in thread | raw e-mail | index | archive | help
Date: Thu, 13 Feb 1997 20:17:38 -0800 (PST) From: Jake Hamby <hamby@aris.jpl.nasa.gov> Hmm, good point. I guess I meant that the commercial compilers seem to have MORE kinds of optimizations than GCC, and because they support relatively few targets, they can devote more time to optimizing each code generation back-end. I have been in fact been working with some people to encourage further work in this area, in particular: 1) Jakub Jelinek and myself have worked on what is termed "tail call" optimization, we call them sibling calls in the gcc implementation. This one is a huge win for many code sets which have a moderate to large stack call depth. It can eliminate entire local stack frames. As an easy example on the Sparc: extern int foo(int a); extern int bar(int b); static __inline__ int baz(int a, int b) { (void) foo(a); return bar(b); } static int func(int a, int b) { return baz(a, b); } gets turned into func: save %sp,-112,%sp call foo,0 mov %i0,%o0 call bar,0 restore %g0,%i1,%o0 (for those unfamiliar with Sparc, "save" allocates a register window and a stack frame, "restore" gives it back, Sparc also has branch/call delay slots) In that example the entire stack frame is given up in the delay slot of the call to bar(). If people think this is useless and cheezy, think again. Walk though your average kernel subsystem and see how many functions go: { if(args_invalid(args)) return -EFAULT; /* whatever */ if((file = file_from_fd(args)) == NULL) return -EBADF; inode = inode_from_file(file); return file->f_op->frobnicate(areg); } Also, networking stacks where layer upon layer gets called via a function ptr dereference as each packet walks up the various layers. Nine times out of ten this is the last thing the function in question does, and thus is subject to the optimizations just described as well. For example, in the Linux kernel sibling call optimization was shown to be applied over 1,000 times, approximately 186 of which were found to be in critical code paths. This was on the Sparc platform. Currently only the Sparc backend support for sibling calls are fully tested and working well in our patches, Intel, Alpha, and MIPS support for siblings calls are mostly done and should be ready soon. We are rather confident that this work will go all be in gcc-2.8.0. 2) I'm sure some people here know this, but there are people who have taken all of the Pentium optimization work on gcc done by the Intel compiler people way back, and are working on improving it. They are actively maintaining those changes, fixing bugs, and adding new optimizations as well. Also, one of the larger reasons that these changes never made it into the FSF gcc sources is that numerous generic changes were made to GCC which were not pretty at all. Cygnus and others are working on revamping some of the front end to back end architecture of GCC so that the types of things the Pentium optimizations needed are there, and are implemented cleanly. Also, the various optimizer bugs in GCC in the past have led people to be wary to use -O2 optimization, much less try additional optimization flags. I know about them, just about all of them are in the strength reduction pass. I am very familiar with the problematic bugs this layer has, and I have been actively trying to get people on the GCC development team to fix them. Almost all of these problems have to do with when a pointer comparison is converted into an integer invariant comparison, and vice versa. GCC in certain circumstances does not notice the change in signed'ness and thus produces incorrect code. In gcc-2.7.2.1, the strength reduction transformations that were known to lead to this situation were disabled entirely and in fact this fix was the entire reason for that release of gcc. ---------------------------------------------//// Yow! 11.26 MB/s remote host TCP bandwidth & //// 199 usec remote TCP latency over 100Mb/s //// ethernet. Beat that! //// -----------------------------------------////__________ o David S. Miller, davem@caip.rutgers.edu /_____________/ / // /_/ ><
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199702140448.XAA18687>