From owner-freebsd-hackers@FreeBSD.ORG Thu Oct 12 15:24:37 2006 Return-Path: X-Original-To: freebsd-hackers@FreeBSD.ORG Delivered-To: freebsd-hackers@FreeBSD.ORG Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 22E1216A417; Thu, 12 Oct 2006 15:24:37 +0000 (UTC) (envelope-from olli@lurza.secnetix.de) Received: from lurza.secnetix.de (lurza.secnetix.de [83.120.8.8]) by mx1.FreeBSD.org (Postfix) with ESMTP id E5DB243D5F; Thu, 12 Oct 2006 15:24:31 +0000 (GMT) (envelope-from olli@lurza.secnetix.de) Received: from lurza.secnetix.de (uhupcr@localhost [127.0.0.1]) by lurza.secnetix.de (8.13.4/8.13.4) with ESMTP id k9CFOOTM069192; Thu, 12 Oct 2006 17:24:29 +0200 (CEST) (envelope-from oliver.fromme@secnetix.de) Received: (from olli@localhost) by lurza.secnetix.de (8.13.4/8.13.1/Submit) id k9CFOOmS069191; Thu, 12 Oct 2006 17:24:24 +0200 (CEST) (envelope-from olli) Date: Thu, 12 Oct 2006 17:24:24 +0200 (CEST) Message-Id: <200610121524.k9CFOOmS069191@lurza.secnetix.de> From: Oliver Fromme To: freebsd-hackers@FreeBSD.ORG, kientzle@FreeBSD.ORG In-Reply-To: <452DEE0A.4060500@freebsd.org> X-Newsgroups: list.freebsd-hackers User-Agent: tin/1.8.2-20060425 ("Shillay") (UNIX) (FreeBSD/4.11-STABLE (i386)) X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-2.1.2 (lurza.secnetix.de [127.0.0.1]); Thu, 12 Oct 2006 17:24:29 +0200 (CEST) X-Mailman-Approved-At: Thu, 12 Oct 2006 16:24:13 +0000 Cc: Subject: Re: "tar -c|gzip" faster than "tar -cz"?!? X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: freebsd-hackers@FreeBSD.ORG, kientzle@FreeBSD.ORG List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 12 Oct 2006 15:24:37 -0000 Tim Kientzle wrote: > It seems that you and others have seen very different > performance. I'd be very interested in knowing why. > I suspect it may have to do with average file size. > How big are the files you're archiving? The numbers that I gave were from archiving a standard root file system of FreeBSD 6.2-PRERELEASE, i.e. some binaries and libs (/bin, /sbin, /lib, /boot/kernel) and a bunch of small files from /etc. Not much else. The gzip that I used for comparison is the stock gzip that comes with 6.2-PRERELEASE, compiled with the default compiler settings. gzip -V says: gzip 1.2.4 (18 Aug 93) Compilation options: DIRENT UTIME STDC_HEADERS HAVE_UNISTD_H ASMV > Does the relative performance differ with larger or > smaller files? Good question. I performed further tests, first one with /usr/ports (which is mostly small files, but a hell of a lot of them): /usr/ports with "tar cz", resulting size is 35266560: 82.47 real 11.09 user 2.34 sys 81.76 real 11.14 user 2.24 sys 82.25 real 11.24 user 2.18 sys /usr/ports with "tar c | gzip", resulting size is 35279112: 77.61 real 8.58 user 2.39 sys 77.64 real 8.67 user 2.27 sys 77.47 real 8.57 user 2.40 sys In this case, the "real" time is much larger than the "user" time. I guess that's the overhead of 85677 files and 23399 directories (according to find(1)). :-) I performed a second test with a directory of documents (mostly PDF which aren't very well compressible, but also some PS and other formats; most of the files are multiple MBytes in size, total about 200 MB): Big PDF/PS documents with "tar cz", result is 125880320: 16.16 real 15.78 user 0.29 sys 16.38 real 15.83 user 0.25 sys 16.16 real 15.82 user 0.24 sys Big PDF/PS documents with "tar c | gzip", result is 125894830: 13.17 real 12.77 user 0.36 sys 13.18 real 12.79 user 0.34 sys 13.19 real 12.73 user 0.38 sys One thing that you can observe is the fact the the "sys" time is slightly larger in the gzip case. I assume that's because of the pipe overhead. Interestingly, in both tests the compressed size of the "gzip" case was slightly larger than the "tar cz" case. That's the opposite of what I got in my very first test (when archiving the root file system). I'm not concerned about the difference in compression sizes, because it's in the sub-percent range. But I'm more concerned about the CPU times ("user" times). It makes quite a clear difference in all of my tests. You should be basically able to reproduce my tests. There's absolutely nothing special about my environment. The test machine is an Athlon64 (but running 32bit FreeBSD/i386 6.2-PRERELEASE), single-core, no SMP. The test data is on two gmirror'ed SATA drives which are quite fast, but all of the data was cached in RAM during my tests. dmesg can be found here, if required: http://www.secnetix.de/~olli/dmesg/box/ Best regards Oliver -- Oliver Fromme, secnetix GmbH & Co. KG, Marktplatz 29, 85567 Grafing Dienstleistungen mit Schwerpunkt FreeBSD: http://www.secnetix.de/bsd Any opinions expressed in this message may be personal to the author and may not necessarily reflect the opinions of secnetix in any way. "Perl will consistently give you what you want, unless what you want is consistency." -- Larry Wall