Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 20 Mar 2001 17:46:30 -0800
From:      Aaron Smith <aaron@mutex.org>
To:        freebsd-hackers@freebsd.org
Cc:        jon@csua.berkeley.edu, breadbox@muppetlabs.com
Subject:   gzip's custom i386 asm should be disabled
Message-ID:  <20010320174630.B82004@gelatinous.com>

next in thread | raw e-mail | index | archive | help

--0rSojgWGcpz+ezC3
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline

gzip's i386 assembly code, activated by default in the FreeBSD source tree,
produces poor performance on an i686 core (PPro/P2/P3). This is due to the
'partial register stall' problem, explained in a URL recently brought up on
the list, http://www.emulators.com/pentium4.htm.

In the course of learning more about partial register stalls I came across
the following i686 and i586 assembly optimizations for gzip:
http://www.muppetlabs.com/~breadbox/software/assembly.html.

This optimized i686 asm avoids partial reg stall and is between 20-40%
faster, with higher compression levels achieving greater benefit from the
patch. The i586 patch is usually only 5% faster, but in some cases achieves
a 25% speedup.

For completeness, I also ran some tests on a non-asm gcc 2.95.2 compile,
with and without -march=pentiumpro. Here are the results (three runs,
averaged, caches warmed with some throwaway runs) on a Pentium II 400,
linux-2.4.2.tar, --best.

                       [type]  [user secs]     [time (as % of slowest)]
                     i386 asm:	  175		    100%
                   no asm, -O:    142		   81.1%
                  no asm, -O2:	  139		   79.4%
 no asm, -O -march=pentiumpro:    136		   77.7%
no asm, -O2 -march=pentiumpro:    140		   80.0%
                     i686 asm:    124		   70.8%

I'm interested in other people's results/tests. Particularly, I should do
some runs with -mcpu=pentiumpro as well.

An important part of the equation is to make sure it doesn't hurt i586
machines. I did several tests on a Pentium 200MMX; the i386 asm and the
gcc-emitted asm are not measurably different on that CPU.

Brian Raiter (breadbox@muppetlabs.com, author of the i586/i686 asm patches)
has contacted the gzip maintainers, but it's been years since a release and
there may not be another gzip release. I have seen a 1.2.4a release which
had his files in a contrib/ directory, but they were not active in any way.

Since I would imagine a large percentage of FreeBSD users run on i686
cores, it'd be great to get this pretty significant speed increase into our
tree.

The i686 patch is neat (30% faster!) but its improvement over gcc's emitted
assembly is small. Disabling the old i386 assembly seems a good first
step. Attached is a patch that disables the custom asm.

I'm interested in hearing everyone's comments.

Aaron

--0rSojgWGcpz+ezC3
Content-Type: text/plain; charset=us-ascii
Content-Disposition: attachment; filename=gzip-noasm-patch

Index: Makefile
===================================================================
RCS file: /usr/cvs/src/gnu/usr.bin/gzip/Makefile,v
retrieving revision 1.21
diff -u -r1.21 Makefile
--- Makefile	1999/08/27 23:35:48	1.21
+++ Makefile	2001/03/20 23:59:48
@@ -8,11 +8,6 @@
 CFLAGS+=-DSTDC_HEADERS=1 -DHAVE_UNISTD_H=1 -DDIRENT=1
 GREP_LIBZ?=	YES
 
-.if ${MACHINE_ARCH} == "i386"
-SRCS+=	match.S
-CFLAGS+=-DASMV
-.endif
-
 MLINKS= gzip.1 gunzip.1  gzip.1 zcat.1  gzip.1 gzcat.1
 MLINKS+= zdiff.1 zcmp.1
 

--0rSojgWGcpz+ezC3--

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20010320174630.B82004>