From owner-freebsd-hackers@FreeBSD.ORG  Wed Oct 10 14:33:18 2012
Return-Path: <owner-freebsd-hackers@FreeBSD.ORG>
Delivered-To: freebsd-hackers@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52])
 by hub.freebsd.org (Postfix) with ESMTP id 351D74A6
 for <freebsd-hackers@freebsd.org>; Wed, 10 Oct 2012 14:33:18 +0000 (UTC)
 (envelope-from lidl@hydra.pix.net)
Received: from hydra.pix.net (hydra.pix.net [IPv6:2001:470:e254:10::3c])
 by mx1.freebsd.org (Postfix) with ESMTP id F3E418FC0A
 for <freebsd-hackers@freebsd.org>; Wed, 10 Oct 2012 14:33:17 +0000 (UTC)
Received: from hydra.pix.net (localhost [127.0.0.1])
 by hydra.pix.net (8.14.5/8.14.5) with ESMTP id q9AEXGuA008619;
 Wed, 10 Oct 2012 10:33:16 -0400 (EDT)
 (envelope-from lidl@hydra.pix.net)
X-Virus-Status: Clean
X-Virus-Scanned: clamav-milter 0.97.5 at mail.pix.net
Received: (from lidl@localhost)
 by hydra.pix.net (8.14.5/8.14.5/Submit) id q9AEXEl9008618;
 Wed, 10 Oct 2012 10:33:14 -0400 (EDT) (envelope-from lidl)
Date: Wed, 10 Oct 2012 10:33:14 -0400
From: Kurt Lidl <lidl@pix.net>
To: Tim Kientzle <tim@kientzle.com>
Subject: Re: SMP Version of tar
Message-ID: <20121010143314.GA8402@pix.net>
References: <5069C9FC.6020400@brandonfa.lk>
 <alpine.BSF.2.00.1210071859430.15957@wojtek.tensor.gdynia.pl>
 <324B736D-8961-4E44-A212-2ECF3E60F2A0@kientzle.com>
 <alpine.BSF.2.00.1210080838170.3664@wojtek.tensor.gdynia.pl>
 <20121008083814.GA5830@straylight.m.ringlet.net>
 <alpine.BSF.2.00.1210081219300.4673@wojtek.tensor.gdynia.pl>
 <15DBA1A9-A4B6-4F7D-A9DC-3412C4BE3517@kientzle.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <15DBA1A9-A4B6-4F7D-A9DC-3412C4BE3517@kientzle.com>
User-Agent: Mutt/1.5.21 (2010-09-15)
Cc: Wojciech Puchar <wojtek@wojtek.tensor.gdynia.pl>,
 Brandon Falk <bfalk_bsd@brandonfa.lk>, freebsd-hackers@freebsd.org
X-BeenThere: freebsd-hackers@freebsd.org
X-Mailman-Version: 2.1.14
Precedence: list
List-Id: Technical Discussions relating to FreeBSD
 <freebsd-hackers.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-hackers>, 
 <mailto:freebsd-hackers-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-hackers>
List-Post: <mailto:freebsd-hackers@freebsd.org>
List-Help: <mailto:freebsd-hackers-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-hackers>,
 <mailto:freebsd-hackers-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 10 Oct 2012 14:33:18 -0000

On Tue, Oct 09, 2012 at 09:54:03PM -0700, Tim Kientzle wrote:
> 
> On Oct 8, 2012, at 3:21 AM, Wojciech Puchar wrote:
> 
> >> Not necessarily.  If I understand correctly what Tim means, he's talking
> >> about an in-memory compression of several blocks by several separate
> >> threads, and then - after all the threads have compressed their
> > 
> > but gzip format is single stream. dictionary IMHO is not reset every X kilobytes.
> > 
> > parallel gzip is possible but not with same data format.
> 
> Yes, it is.
> 
> The following creates a compressed file that
> is completely compatible with the standard
> gzip/gunzip tools:
> 
>    * Break file into blocks
>    * Compress each block into a gzip file (with gzip header and trailer information)
>    * Concatenate the result.
> 
> This can be correctly decoded by gunzip.
> 
> In theory, you get slightly worse compression.  In practice, if your blocks are reasonably large (a megabyte or so each), the difference is negligible.

I am not sure, but I think this conversation might have a slight
misunderstanding due to imprecisely specified language, while the
technical part is in agreement.

Tim is correct in that gzip datastream allows for concatenation of
compressed blocks of data, so you might break the input stream into
a bunch of blocks [A, B, C, etc], and then can append those together
into [A.gz, B.gz, C.gz, etc], and when uncompressed, you will get
the original input stream.

I think that Wojciech's point is that the compressed data stream for
for the single datastream is different than the compressed data
stream of [A.gz, B.gz, C.gz, etc].  Both will decompress to the same
thing, but the intermediate compressed representation will be different.

-Kurt