From owner-freebsd-hackers@FreeBSD.ORG Wed Oct 11 11:15:10 2006 Return-Path: X-Original-To: freebsd-hackers@FreeBSD.ORG Delivered-To: freebsd-hackers@FreeBSD.ORG Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id B1FA216A4F3; Wed, 11 Oct 2006 11:15:10 +0000 (UTC) (envelope-from vd@datamax.bg) Received: from jengal.datamax.bg (jengal.datamax.bg [82.103.104.21]) by mx1.FreeBSD.org (Postfix) with ESMTP id 2BA7243D5A; Wed, 11 Oct 2006 11:15:10 +0000 (GMT) (envelope-from vd@datamax.bg) Received: from qlovarnika.bg.datamax (qlovarnika.bg.datamax [192.168.10.2]) by jengal.datamax.bg (Postfix) with SMTP id 25CC8B844; Wed, 11 Oct 2006 14:15:09 +0300 (EEST) Received: (nullmailer pid 54744 invoked by uid 1002); Wed, 11 Oct 2006 11:15:09 -0000 Date: Wed, 11 Oct 2006 14:15:09 +0300 From: Vasil Dimov To: Oliver Fromme Message-ID: <20061011111509.GC54180@qlovarnika.bg.datamax> References: <200610101727.k9AHRrYo039774@lurza.secnetix.de> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="MnLPg7ZWsaic7Fhd" Content-Disposition: inline In-Reply-To: <200610101727.k9AHRrYo039774@lurza.secnetix.de> Cc: freebsd-hackers@FreeBSD.ORG Subject: Re: "tar -c|gzip" faster than "tar -cz"?!? X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: vd@FreeBSD.org List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 11 Oct 2006 11:15:10 -0000 --MnLPg7ZWsaic7Fhd Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Tue, Oct 10, 2006 at 07:27:53PM +0200, Oliver Fromme wrote: > Hi, >=20 > While doing some performance tuning of a backup script > I noticed that the -z option of our (bsd)tar behaves in > a very suboptimal way. It's not only a lot slower than > using gzip separately, it also compresses worse. >=20 > I compared the following two commands (cwd=3D/): >=20 > A. tar -cz --one-file-system -f- . | wc -c > B. tar -c --one-file-system -f- . | gzip | wc -c >=20 > In order to measure the time of the whole command pipes, > I encapsulated them into subshell calls like this: > /usr/bin/time sh -c 'tar ... | wc -c' >=20 > These are results for multiple invocations of A (tar -cz): >=20 > 7.30 real 7.15 user 0.09 sys > 7.28 real 7.13 user 0.12 sys > 7.29 real 7.14 user 0.09 sys >=20 > And these are the numbers for B (tar -c | gzip): >=20 > 5.54 real 5.37 user 0.15 sys > 5.54 real 5.34 user 0.18 sys > 5.55 real 5.40 user 0.12 sys >=20 > My first thought was that "tar -z" would use a better > compression level (e.g. -9) vs. the gzip default of -6, > which would explain why it is slower. Therefore I > expected the resulting backup to be smaller -- but just > the opposite is the case. Command A resulted in a > compressed size of 25364480 bytes, while B was a bit > smaller (25306059 bytes). >=20 > I'm surprised because I expected "tar -z" to be faster > than a separate gzip process (at the same compression > level), or at least as fast. But it's 30% slower. >=20 > Is that a known problem? Is someone working on it? >=20 You (wrongly) assumed that two processed will do slower than a single one. It's exactly the opposite. While the one is constantly reading disk contents the other is constantly compressing. With one process you have to read data, compress, read data, compress and so on which is suboptimal (see Mike's reply too). It is not a problem in any program nor a feature in another. It's just how the things work. --=20 Vasil Dimov gro.DSBeerF@dv % Look, that's why there's rules, understand? So that you think before you break 'em. -- (Terry Pratchett, Thief of Time) --MnLPg7ZWsaic7Fhd Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- iD8DBQFFLNI9Fw6SP/bBpCARAmRJAJ44tPuIXJvRKoRlrm1hNpT7QhSH/gCgxhc9 +QT3Q3q4gFwnK5xvf+nvyiY= =ERvg -----END PGP SIGNATURE----- --MnLPg7ZWsaic7Fhd--