FreeBSD Mail Archives

Date:      Thu, 13 Feb 1997 23:48:07 -0500
From:      "David S. Miller" <davem@jenolan.rutgers.edu>
To:        hamby@aris.jpl.nasa.gov
Cc:        asami@vader.cs.berkeley.edu, jmb@freefall.freebsd.org, hackers@FreeBSD.ORG
Subject:   Re: Sun Workshop compiler vs. GCC?
Message-ID:  <199702140448.XAA18687@jenolan.caipgeneral>
In-Reply-To: <Pine.GSO.3.95.970213201150.10714A-100000@aris> (message from Jake Hamby on Thu, 13 Feb 1997 20:17:38 -0800 (PST))


   Date: Thu, 13 Feb 1997 20:17:38 -0800 (PST)
   From: Jake Hamby <hamby@aris.jpl.nasa.gov>

   Hmm, good point.  I guess I meant that the commercial compilers
   seem to have MORE kinds of optimizations than GCC, and because they
   support relatively few targets, they can devote more time to
   optimizing each code generation back-end.

I have been in fact been working with some people to encourage
further work in this area, in particular:

	1) Jakub Jelinek and myself have worked on what is termed
	   "tail call" optimization, we call them sibling calls in
	   the gcc implementation.

	   This one is a huge win for many code sets which have a
	   moderate to large stack call depth.  It can
	   eliminate entire local stack frames.  As an easy example
	   on the Sparc:

extern int foo(int a);

extern int bar(int b);

static __inline__ int baz(int a, int b)
{
	(void) foo(a);
	return bar(b);
}

static int func(int a, int b)
{
	return baz(a, b);
}

	gets turned into

func:
	save %sp,-112,%sp
	call foo,0
	mov %i0,%o0
	call bar,0
	restore %g0,%i1,%o0

	(for those unfamiliar with Sparc, "save" allocates a register
	 window and a stack frame, "restore" gives it back, Sparc also
	 has branch/call delay slots)

	In that example the entire stack frame is given up in the
	delay slot of the call to bar().  If people think this is
	useless and cheezy, think again.  Walk though your average
	kernel subsystem and see how many functions go:

	{
		if(args_invalid(args))
			return -EFAULT; /* whatever */
		if((file = file_from_fd(args)) == NULL)
			return -EBADF;
		inode = inode_from_file(file);
		return file->f_op->frobnicate(areg);
	}

	Also, networking stacks where layer upon layer gets called
	via a function ptr dereference as each packet walks up the
	various layers.  Nine times out of ten this is the last thing
	the function in question does, and thus is subject to the
	optimizations just described as well.

	For example, in the Linux kernel sibling call optimization
	was shown to be applied over 1,000 times, approximately 186
	of which were found to be in critical code paths.  This was on
	the Sparc platform.

	Currently only the Sparc backend support for sibling calls
	are fully tested and working well in our patches, Intel,
	Alpha, and MIPS support for siblings calls are mostly done
	and should be ready soon.  We are rather confident that this
	work will go all be in gcc-2.8.0.

	2) I'm sure some people here know this, but there are people
	   who have taken all of the Pentium optimization work on gcc
	   done by the Intel compiler people way back, and are working
	   on improving it.  They are actively maintaining those
	   changes, fixing bugs, and adding new optimizations as well.
	   Also, one of the larger reasons that these changes never
	   made it into the FSF gcc sources is that numerous generic
	   changes were made to GCC which were not pretty at all.
	   Cygnus and others are working on revamping some of the
	   front end to back end architecture of GCC so that the types
	   of things the Pentium optimizations needed are there, and
	   are implemented cleanly.

   Also, the various optimizer bugs in GCC in the past have led people
   to be wary to use -O2 optimization, much less try additional
   optimization flags.

I know about them, just about all of them are in the strength
reduction pass.  I am very familiar with the problematic bugs this
layer has, and I have been actively trying to get people on the GCC
development team to fix them.  Almost all of these problems have to do
with when a pointer comparison is converted into an integer invariant
comparison, and vice versa.  GCC in certain circumstances does not
notice the change in signed'ness and thus produces incorrect code.  In
gcc-2.7.2.1, the strength reduction transformations that were known to
lead to this situation were disabled entirely and in fact this fix was
the entire reason for that release of gcc.

---------------------------------------------////
Yow! 11.26 MB/s remote host TCP bandwidth & ////
199 usec remote TCP latency over 100Mb/s   ////
ethernet.  Beat that!                     ////
-----------------------------------------////__________  o
David S. Miller, davem@caip.rutgers.edu /_____________/ / // /_/ ><

Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199702140448.XAA18687>

Header And Logo

Peripheral Links

Site Navigation

Header And Logo

Peripheral Links

Search

Site Navigation