From owner-freebsd-hackers@FreeBSD.ORG Wed Oct 3 17:37:54 2012 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 8EFB71065672 for ; Wed, 3 Oct 2012 17:37:54 +0000 (UTC) (envelope-from lists@jnielsen.net) Received: from ns1.jnielsen.net (secure.freebsdsolutions.net [69.55.234.48]) by mx1.freebsd.org (Postfix) with ESMTP id 52B978FC0C for ; Wed, 3 Oct 2012 17:37:53 +0000 (UTC) Received: from [10.10.1.32] (office.betterlinux.com [199.58.199.60]) (authenticated bits=0) by ns1.jnielsen.net (8.14.4/8.14.4) with ESMTP id q93HbaLT035313 (version=TLSv1/SSLv3 cipher=AES128-SHA bits=128 verify=NOT); Wed, 3 Oct 2012 13:37:36 -0400 (EDT) (envelope-from lists@jnielsen.net) Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Mac OS X Mail 6.1 \(1498\)) From: John Nielsen In-Reply-To: <20121002083634.3103fe958508a4026384ac96@yamagi.org> Date: Wed, 3 Oct 2012 11:37:39 -0600 Content-Transfer-Encoding: quoted-printable Message-Id: References: <5069C9FC.6020400@brandonfa.lk> <87549776-9051-4B4B-8D53-DAE6D51C2A94@kientzle.com> <20121002083634.3103fe958508a4026384ac96@yamagi.org> To: Yamagi Burmeister X-Mailer: Apple Mail (2.1498) X-DCC-x.dcc-servers-Metrics: ns1.jnielsen.net 104; Body=4 Fuz1=4 Fuz2=4 X-Virus-Scanned: clamav-milter 0.97.5 at ns1.jnielsen.net X-Virus-Status: Clean Cc: bfalk_bsd@brandonfa.lk, freebsd-hackers@freebsd.org Subject: Re: SMP Version of tar X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 03 Oct 2012 17:37:54 -0000 On Oct 2, 2012, at 12:36 AM, Yamagi Burmeister wrote: > On Mon, 1 Oct 2012 22:16:53 -0700 > Tim Kientzle wrote: >=20 >> There are a few different parallel command-line compressors and = decompressors in ports; experiment a lot (with large files being read = from and/or written to disk) and see what the real effect is. In = particular, some decompression algorithms are actually faster than = memcpy() when run on a single processor. Parallelizing such algorithms = is not likely to help much in the real world. >>=20 >> The two popular algorithms I would expect to benefit most are bzip2 = compression and lzma compression (targeting xz or lzip format). For = decompression, bzip2 is block-oriented so fits SMP pretty naturally. = Other popular algorithms are stream-oriented and less amenable to = parallelization. >>=20 >> Take a careful look at pbzip2, which is a parallelized bzip2/bunzip2 = implementation that's already under a BSD license. You should be able = to get a lot of ideas about how to implement a parallel compression = algorithm. Better yet, you might be able to reuse a lot of the existing = pbzip2 code. >>=20 >> Mark Adler's pigz is also worth studying. It's also = license-friendly, and is built on top of regular zlib, which is a nice = technique when it's feasible. >=20 > Just a small note: There's a parallel implementation of xz called > "pixz". It's build atop of liblzma and libarchiv and stands under a=20 > BSD style license. See: https://github.com/vasi/pixz Maybe it's > possible to reuse most of the code. See also below, which has some bugfixes/improvements that AFAIK were = never committed in the original project (though they were submitted). https://github.com/jlrobins/pixz JN