Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 12 Oct 2006 17:24:24 +0200 (CEST)
From:      Oliver Fromme <olli@lurza.secnetix.de>
To:        freebsd-hackers@FreeBSD.ORG, kientzle@FreeBSD.ORG
Subject:   Re: "tar -c|gzip" faster than "tar -cz"?!?
Message-ID:  <200610121524.k9CFOOmS069191@lurza.secnetix.de>
In-Reply-To: <452DEE0A.4060500@freebsd.org>

next in thread | previous in thread | raw e-mail | index | archive | help
Tim Kientzle wrote:
 > It seems that you and others have seen very different
 > performance.  I'd be very interested in knowing why.
 > I suspect it may have to do with average file size.
 > How big are the files you're archiving?

The numbers that I gave were from archiving a standard
root file system of FreeBSD 6.2-PRERELEASE, i.e. some
binaries and libs (/bin, /sbin, /lib, /boot/kernel) and
a bunch of small files from /etc.  Not much else.

The gzip that I used for comparison is the stock gzip
that comes with 6.2-PRERELEASE, compiled with the default
compiler settings.  gzip -V says:

gzip 1.2.4 (18 Aug 93)
Compilation options:
DIRENT UTIME STDC_HEADERS HAVE_UNISTD_H ASMV

 > Does the relative performance differ with larger or
 > smaller files?

Good question.  I performed further tests, first one with
/usr/ports (which is mostly small files, but a hell of a
lot of them):

/usr/ports with "tar cz", resulting size is 35266560:
   82.47 real   11.09 user   2.34 sys
   81.76 real   11.14 user   2.24 sys
   82.25 real   11.24 user   2.18 sys

/usr/ports with "tar c | gzip", resulting size is 35279112:
   77.61 real   8.58 user   2.39 sys
   77.64 real   8.67 user   2.27 sys
   77.47 real   8.57 user   2.40 sys

In this case, the "real" time is much larger than the
"user" time.  I guess that's the overhead of 85677 files
and 23399 directories (according to find(1)).  :-)

I performed a second test with a directory of documents
(mostly PDF which aren't very well compressible, but also
some PS and other formats; most of the files are multiple
MBytes in size, total about 200 MB):

Big PDF/PS documents with "tar cz", result is 125880320:
   16.16 real   15.78 user   0.29 sys
   16.38 real   15.83 user   0.25 sys
   16.16 real   15.82 user   0.24 sys

Big PDF/PS documents with "tar c | gzip", result is 125894830:
   13.17 real   12.77 user   0.36 sys
   13.18 real   12.79 user   0.34 sys
   13.19 real   12.73 user   0.38 sys

One thing that you can observe is the fact the the "sys"
time is slightly larger in the gzip case.  I assume that's
because of the pipe overhead.

Interestingly, in both tests the compressed size of the
"gzip" case was slightly larger than the "tar cz" case.
That's the opposite of what I got in my very first test
(when archiving the root file system).

I'm not concerned about the difference in compression
sizes, because it's in the sub-percent range.  But I'm
more concerned about the CPU times ("user" times).
It makes quite a clear difference in all of my tests.

You should be basically able to reproduce my tests.
There's absolutely nothing special about my environment.

The test machine is an Athlon64 (but running 32bit
FreeBSD/i386 6.2-PRERELEASE), single-core, no SMP.
The test data is on two gmirror'ed SATA drives which
are quite fast, but all of the data was cached in
RAM during my tests.

dmesg can be found here, if required:
http://www.secnetix.de/~olli/dmesg/box/

Best regards
   Oliver

-- 
Oliver Fromme,  secnetix GmbH & Co. KG, Marktplatz 29, 85567 Grafing
Dienstleistungen mit Schwerpunkt FreeBSD: http://www.secnetix.de/bsd
Any opinions expressed in this message may be personal to the author
and may not necessarily reflect the opinions of secnetix in any way.

"Perl will consistently give you what you want,
unless what you want is consistency."
        -- Larry Wall



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200610121524.k9CFOOmS069191>