Date: Thu, 13 Feb 1997 23:48:07 -0500 From: "David S. Miller" <davem@jenolan.rutgers.edu> To: hamby@aris.jpl.nasa.gov Cc: asami@vader.cs.berkeley.edu, jmb@freefall.freebsd.org, hackers@FreeBSD.ORG Subject: Re: Sun Workshop compiler vs. GCC? Message-ID: <199702140448.XAA18687@jenolan.caipgeneral> In-Reply-To: <Pine.GSO.3.95.970213201150.10714A-100000@aris> (message from Jake Hamby on Thu, 13 Feb 1997 20:17:38 -0800 (PST))
next in thread | previous in thread | raw e-mail | index | archive | help
Date: Thu, 13 Feb 1997 20:17:38 -0800 (PST)
From: Jake Hamby <hamby@aris.jpl.nasa.gov>
Hmm, good point. I guess I meant that the commercial compilers
seem to have MORE kinds of optimizations than GCC, and because they
support relatively few targets, they can devote more time to
optimizing each code generation back-end.
I have been in fact been working with some people to encourage
further work in this area, in particular:
1) Jakub Jelinek and myself have worked on what is termed
"tail call" optimization, we call them sibling calls in
the gcc implementation.
This one is a huge win for many code sets which have a
moderate to large stack call depth. It can
eliminate entire local stack frames. As an easy example
on the Sparc:
extern int foo(int a);
extern int bar(int b);
static __inline__ int baz(int a, int b)
{
(void) foo(a);
return bar(b);
}
static int func(int a, int b)
{
return baz(a, b);
}
gets turned into
func:
save %sp,-112,%sp
call foo,0
mov %i0,%o0
call bar,0
restore %g0,%i1,%o0
(for those unfamiliar with Sparc, "save" allocates a register
window and a stack frame, "restore" gives it back, Sparc also
has branch/call delay slots)
In that example the entire stack frame is given up in the
delay slot of the call to bar(). If people think this is
useless and cheezy, think again. Walk though your average
kernel subsystem and see how many functions go:
{
if(args_invalid(args))
return -EFAULT; /* whatever */
if((file = file_from_fd(args)) == NULL)
return -EBADF;
inode = inode_from_file(file);
return file->f_op->frobnicate(areg);
}
Also, networking stacks where layer upon layer gets called
via a function ptr dereference as each packet walks up the
various layers. Nine times out of ten this is the last thing
the function in question does, and thus is subject to the
optimizations just described as well.
For example, in the Linux kernel sibling call optimization
was shown to be applied over 1,000 times, approximately 186
of which were found to be in critical code paths. This was on
the Sparc platform.
Currently only the Sparc backend support for sibling calls
are fully tested and working well in our patches, Intel,
Alpha, and MIPS support for siblings calls are mostly done
and should be ready soon. We are rather confident that this
work will go all be in gcc-2.8.0.
2) I'm sure some people here know this, but there are people
who have taken all of the Pentium optimization work on gcc
done by the Intel compiler people way back, and are working
on improving it. They are actively maintaining those
changes, fixing bugs, and adding new optimizations as well.
Also, one of the larger reasons that these changes never
made it into the FSF gcc sources is that numerous generic
changes were made to GCC which were not pretty at all.
Cygnus and others are working on revamping some of the
front end to back end architecture of GCC so that the types
of things the Pentium optimizations needed are there, and
are implemented cleanly.
Also, the various optimizer bugs in GCC in the past have led people
to be wary to use -O2 optimization, much less try additional
optimization flags.
I know about them, just about all of them are in the strength
reduction pass. I am very familiar with the problematic bugs this
layer has, and I have been actively trying to get people on the GCC
development team to fix them. Almost all of these problems have to do
with when a pointer comparison is converted into an integer invariant
comparison, and vice versa. GCC in certain circumstances does not
notice the change in signed'ness and thus produces incorrect code. In
gcc-2.7.2.1, the strength reduction transformations that were known to
lead to this situation were disabled entirely and in fact this fix was
the entire reason for that release of gcc.
---------------------------------------------////
Yow! 11.26 MB/s remote host TCP bandwidth & ////
199 usec remote TCP latency over 100Mb/s ////
ethernet. Beat that! ////
-----------------------------------------////__________ o
David S. Miller, davem@caip.rutgers.edu /_____________/ / // /_/ ><
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199702140448.XAA18687>
