From owner-freebsd-hackers@FreeBSD.ORG Wed Oct 10 14:33:18 2012 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 351D74A6 for ; Wed, 10 Oct 2012 14:33:18 +0000 (UTC) (envelope-from lidl@hydra.pix.net) Received: from hydra.pix.net (hydra.pix.net [IPv6:2001:470:e254:10::3c]) by mx1.freebsd.org (Postfix) with ESMTP id F3E418FC0A for ; Wed, 10 Oct 2012 14:33:17 +0000 (UTC) Received: from hydra.pix.net (localhost [127.0.0.1]) by hydra.pix.net (8.14.5/8.14.5) with ESMTP id q9AEXGuA008619; Wed, 10 Oct 2012 10:33:16 -0400 (EDT) (envelope-from lidl@hydra.pix.net) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.97.5 at mail.pix.net Received: (from lidl@localhost) by hydra.pix.net (8.14.5/8.14.5/Submit) id q9AEXEl9008618; Wed, 10 Oct 2012 10:33:14 -0400 (EDT) (envelope-from lidl) Date: Wed, 10 Oct 2012 10:33:14 -0400 From: Kurt Lidl To: Tim Kientzle Subject: Re: SMP Version of tar Message-ID: <20121010143314.GA8402@pix.net> References: <5069C9FC.6020400@brandonfa.lk> <324B736D-8961-4E44-A212-2ECF3E60F2A0@kientzle.com> <20121008083814.GA5830@straylight.m.ringlet.net> <15DBA1A9-A4B6-4F7D-A9DC-3412C4BE3517@kientzle.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <15DBA1A9-A4B6-4F7D-A9DC-3412C4BE3517@kientzle.com> User-Agent: Mutt/1.5.21 (2010-09-15) Cc: Wojciech Puchar , Brandon Falk , freebsd-hackers@freebsd.org X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 10 Oct 2012 14:33:18 -0000 On Tue, Oct 09, 2012 at 09:54:03PM -0700, Tim Kientzle wrote: > > On Oct 8, 2012, at 3:21 AM, Wojciech Puchar wrote: > > >> Not necessarily. If I understand correctly what Tim means, he's talking > >> about an in-memory compression of several blocks by several separate > >> threads, and then - after all the threads have compressed their > > > > but gzip format is single stream. dictionary IMHO is not reset every X kilobytes. > > > > parallel gzip is possible but not with same data format. > > Yes, it is. > > The following creates a compressed file that > is completely compatible with the standard > gzip/gunzip tools: > > * Break file into blocks > * Compress each block into a gzip file (with gzip header and trailer information) > * Concatenate the result. > > This can be correctly decoded by gunzip. > > In theory, you get slightly worse compression. In practice, if your blocks are reasonably large (a megabyte or so each), the difference is negligible. I am not sure, but I think this conversation might have a slight misunderstanding due to imprecisely specified language, while the technical part is in agreement. Tim is correct in that gzip datastream allows for concatenation of compressed blocks of data, so you might break the input stream into a bunch of blocks [A, B, C, etc], and then can append those together into [A.gz, B.gz, C.gz, etc], and when uncompressed, you will get the original input stream. I think that Wojciech's point is that the compressed data stream for for the single datastream is different than the compressed data stream of [A.gz, B.gz, C.gz, etc]. Both will decompress to the same thing, but the intermediate compressed representation will be different. -Kurt