FreeBSD Mail Archives

Date:      Wed, 11 Oct 2006 14:15:09 +0300
From:      Vasil Dimov <vd@FreeBSD.org>
To:        Oliver Fromme <olli@lurza.secnetix.de>
Cc:        freebsd-hackers@FreeBSD.ORG
Subject:   Re: "tar -c|gzip" faster than "tar -cz"?!?
Message-ID:  <20061011111509.GC54180@qlovarnika.bg.datamax>
In-Reply-To: <200610101727.k9AHRrYo039774@lurza.secnetix.de>
References:  <200610101727.k9AHRrYo039774@lurza.secnetix.de>


[-- Attachment #1 --]
On Tue, Oct 10, 2006 at 07:27:53PM +0200, Oliver Fromme wrote:
> Hi,
> 
> While doing some performance tuning of a backup script
> I noticed that the -z option of our (bsd)tar behaves in
> a very suboptimal way.  It's not only a lot slower than
> using gzip separately, it also compresses worse.
> 
> I compared the following two commands (cwd=/):
> 
> A.  tar -cz --one-file-system -f- . | wc -c
> B.  tar -c --one-file-system -f- . | gzip | wc -c
> 
> In order to measure the time of the whole command pipes,
> I encapsulated them into subshell calls like this:
> /usr/bin/time sh -c 'tar ... | wc -c'
> 
> These are results for multiple invocations of A (tar -cz):
> 
>    7.30 real   7.15 user   0.09 sys
>    7.28 real   7.13 user   0.12 sys
>    7.29 real   7.14 user   0.09 sys
> 
> And these are the numbers for B (tar -c | gzip):
> 
>    5.54 real   5.37 user   0.15 sys
>    5.54 real   5.34 user   0.18 sys
>    5.55 real   5.40 user   0.12 sys
> 
> My first thought was that "tar -z" would use a better
> compression level (e.g. -9) vs. the gzip default of -6,
> which would explain why it is slower.  Therefore I
> expected the resulting backup to be smaller -- but just
> the opposite is the case.  Command A resulted in a
> compressed size of 25364480 bytes, while B was a bit
> smaller (25306059 bytes).
> 
> I'm surprised because I expected "tar -z" to be faster
> than a separate gzip process (at the same compression
> level), or at least as fast.  But it's 30% slower.
> 
> Is that a known problem?  Is someone working on it?
> 

You (wrongly) assumed that two processed will do slower than a single
one. It's exactly the opposite. While the one is constantly reading disk
contents the other is constantly compressing. With one process you have
to read data, compress, read data, compress and so on which is
suboptimal (see Mike's reply too).

It is not a problem in any program nor a feature in another. It's just
how the things work.

-- 
Vasil Dimov
gro.DSBeerF@dv
%
Look, that's why there's rules, understand?
So that you think before you break 'em.
    -- (Terry Pratchett, Thief of Time)

[-- Attachment #2 --]
-----BEGIN PGP SIGNATURE-----

iD8DBQFFLNI9Fw6SP/bBpCARAmRJAJ44tPuIXJvRKoRlrm1hNpT7QhSH/gCgxhc9
+QT3Q3q4gFwnK5xvf+nvyiY=
=ERvg
-----END PGP SIGNATURE-----

Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20061011111509.GC54180>

Header And Logo

Peripheral Links

Site Navigation

Header And Logo

Peripheral Links

Search

Site Navigation