Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 10 Oct 2012 10:33:14 -0400
From:      Kurt Lidl <lidl@pix.net>
To:        Tim Kientzle <tim@kientzle.com>
Cc:        Wojciech Puchar <wojtek@wojtek.tensor.gdynia.pl>, Brandon Falk <bfalk_bsd@brandonfa.lk>, freebsd-hackers@freebsd.org
Subject:   Re: SMP Version of tar
Message-ID:  <20121010143314.GA8402@pix.net>
In-Reply-To: <15DBA1A9-A4B6-4F7D-A9DC-3412C4BE3517@kientzle.com>
References:  <5069C9FC.6020400@brandonfa.lk> <alpine.BSF.2.00.1210071859430.15957@wojtek.tensor.gdynia.pl> <324B736D-8961-4E44-A212-2ECF3E60F2A0@kientzle.com> <alpine.BSF.2.00.1210080838170.3664@wojtek.tensor.gdynia.pl> <20121008083814.GA5830@straylight.m.ringlet.net> <alpine.BSF.2.00.1210081219300.4673@wojtek.tensor.gdynia.pl> <15DBA1A9-A4B6-4F7D-A9DC-3412C4BE3517@kientzle.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, Oct 09, 2012 at 09:54:03PM -0700, Tim Kientzle wrote:
> 
> On Oct 8, 2012, at 3:21 AM, Wojciech Puchar wrote:
> 
> >> Not necessarily.  If I understand correctly what Tim means, he's talking
> >> about an in-memory compression of several blocks by several separate
> >> threads, and then - after all the threads have compressed their
> > 
> > but gzip format is single stream. dictionary IMHO is not reset every X kilobytes.
> > 
> > parallel gzip is possible but not with same data format.
> 
> Yes, it is.
> 
> The following creates a compressed file that
> is completely compatible with the standard
> gzip/gunzip tools:
> 
>    * Break file into blocks
>    * Compress each block into a gzip file (with gzip header and trailer information)
>    * Concatenate the result.
> 
> This can be correctly decoded by gunzip.
> 
> In theory, you get slightly worse compression.  In practice, if your blocks are reasonably large (a megabyte or so each), the difference is negligible.

I am not sure, but I think this conversation might have a slight
misunderstanding due to imprecisely specified language, while the
technical part is in agreement.

Tim is correct in that gzip datastream allows for concatenation of
compressed blocks of data, so you might break the input stream into
a bunch of blocks [A, B, C, etc], and then can append those together
into [A.gz, B.gz, C.gz, etc], and when uncompressed, you will get
the original input stream.

I think that Wojciech's point is that the compressed data stream for
for the single datastream is different than the compressed data
stream of [A.gz, B.gz, C.gz, etc].  Both will decompress to the same
thing, but the intermediate compressed representation will be different.

-Kurt



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20121010143314.GA8402>