Date: Wed, 11 Oct 2006 14:15:09 +0300 From: Vasil Dimov <vd@FreeBSD.org> To: Oliver Fromme <olli@lurza.secnetix.de> Cc: freebsd-hackers@FreeBSD.ORG Subject: Re: "tar -c|gzip" faster than "tar -cz"?!? Message-ID: <20061011111509.GC54180@qlovarnika.bg.datamax> In-Reply-To: <200610101727.k9AHRrYo039774@lurza.secnetix.de> References: <200610101727.k9AHRrYo039774@lurza.secnetix.de>
next in thread | previous in thread | raw e-mail | index | archive | help
--MnLPg7ZWsaic7Fhd Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Tue, Oct 10, 2006 at 07:27:53PM +0200, Oliver Fromme wrote: > Hi, >=20 > While doing some performance tuning of a backup script > I noticed that the -z option of our (bsd)tar behaves in > a very suboptimal way. It's not only a lot slower than > using gzip separately, it also compresses worse. >=20 > I compared the following two commands (cwd=3D/): >=20 > A. tar -cz --one-file-system -f- . | wc -c > B. tar -c --one-file-system -f- . | gzip | wc -c >=20 > In order to measure the time of the whole command pipes, > I encapsulated them into subshell calls like this: > /usr/bin/time sh -c 'tar ... | wc -c' >=20 > These are results for multiple invocations of A (tar -cz): >=20 > 7.30 real 7.15 user 0.09 sys > 7.28 real 7.13 user 0.12 sys > 7.29 real 7.14 user 0.09 sys >=20 > And these are the numbers for B (tar -c | gzip): >=20 > 5.54 real 5.37 user 0.15 sys > 5.54 real 5.34 user 0.18 sys > 5.55 real 5.40 user 0.12 sys >=20 > My first thought was that "tar -z" would use a better > compression level (e.g. -9) vs. the gzip default of -6, > which would explain why it is slower. Therefore I > expected the resulting backup to be smaller -- but just > the opposite is the case. Command A resulted in a > compressed size of 25364480 bytes, while B was a bit > smaller (25306059 bytes). >=20 > I'm surprised because I expected "tar -z" to be faster > than a separate gzip process (at the same compression > level), or at least as fast. But it's 30% slower. >=20 > Is that a known problem? Is someone working on it? >=20 You (wrongly) assumed that two processed will do slower than a single one. It's exactly the opposite. While the one is constantly reading disk contents the other is constantly compressing. With one process you have to read data, compress, read data, compress and so on which is suboptimal (see Mike's reply too). It is not a problem in any program nor a feature in another. It's just how the things work. --=20 Vasil Dimov gro.DSBeerF@dv % Look, that's why there's rules, understand? So that you think before you break 'em. -- (Terry Pratchett, Thief of Time) --MnLPg7ZWsaic7Fhd Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- iD8DBQFFLNI9Fw6SP/bBpCARAmRJAJ44tPuIXJvRKoRlrm1hNpT7QhSH/gCgxhc9 +QT3Q3q4gFwnK5xvf+nvyiY= =ERvg -----END PGP SIGNATURE----- --MnLPg7ZWsaic7Fhd--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20061011111509.GC54180>