Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 3 Oct 2012 11:37:39 -0600
From:      John Nielsen <lists@jnielsen.net>
To:        Yamagi Burmeister <lists@yamagi.org>
Cc:        bfalk_bsd@brandonfa.lk, freebsd-hackers@freebsd.org
Subject:   Re: SMP Version of tar
Message-ID:  <E1A4E8ED-189F-48F1-BC36-6960E33EB5E4@jnielsen.net>
In-Reply-To: <20121002083634.3103fe958508a4026384ac96@yamagi.org>
References:  <5069C9FC.6020400@brandonfa.lk> <87549776-9051-4B4B-8D53-DAE6D51C2A94@kientzle.com> <20121002083634.3103fe958508a4026384ac96@yamagi.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On Oct 2, 2012, at 12:36 AM, Yamagi Burmeister <lists@yamagi.org> wrote:

> On Mon, 1 Oct 2012 22:16:53 -0700
> Tim Kientzle <tim@kientzle.com> wrote:
>=20
>> There are a few different parallel command-line compressors and =
decompressors in ports; experiment a lot (with large files being read =
from and/or written to disk) and see what the real effect is.  In =
particular, some decompression algorithms are actually faster than =
memcpy() when run on a single processor.  Parallelizing such algorithms =
is not likely to help much in the real world.
>>=20
>> The two popular algorithms I would expect to benefit most are bzip2 =
compression and lzma compression (targeting xz or lzip format).  For =
decompression, bzip2 is block-oriented so fits SMP pretty naturally.  =
Other popular algorithms are stream-oriented and less amenable to =
parallelization.
>>=20
>> Take a careful look at pbzip2, which is a parallelized bzip2/bunzip2 =
implementation that's already under a BSD license.  You should be able =
to get a lot of ideas about how to implement a parallel compression =
algorithm.  Better yet, you might be able to reuse a lot of the existing =
pbzip2 code.
>>=20
>> Mark Adler's pigz is also worth studying.  It's also =
license-friendly, and is built on top of regular zlib, which is a nice =
technique when it's feasible.
>=20
> Just a small note: There's a parallel implementation of xz called
> "pixz". It's build atop of liblzma and libarchiv and stands under a=20
> BSD style license. See: https://github.com/vasi/pixz Maybe it's
> possible to reuse most of the code.


See also below, which has some bugfixes/improvements that AFAIK were =
never committed in the original project (though they were submitted).
https://github.com/jlrobins/pixz

JN




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?E1A4E8ED-189F-48F1-BC36-6960E33EB5E4>