Date: Tue, 25 Nov 2008 12:51:33 -0800 (PST) From: bf <bf2006a@yahoo.com> To: Ivan Voras <ivoras@freebsd.org>, freebsd-hackers@FreeBSD.org Subject: Re: lzma compression/decompression in bsdtar/libarchive? Message-ID: <48704.32247.qm@web39107.mail.mud.yahoo.com> In-Reply-To: <9bbcef730811251141w63ad793as6efac3e7156bc2ef@mail.gmail.com>
next in thread | previous in thread | raw e-mail | index | archive | help
--- On Tue, 11/25/08, Ivan Voras <ivoras@freebsd.org> wrote: > From: Ivan Voras <ivoras@freebsd.org> > Subject: Re: lzma compression/decompression in bsdtar/libarchive? > To: bf2006a@yahoo.com > Cc: freebsd-hackers@freebsd.org > Date: Tuesday, November 25, 2008, 2:41 PM > 2008/11/25 bf <bf2006a@yahoo.com>: > >> How useful would LZMA be without supporting the > .7z file format? > >> Probably not at all, since there isn't a > gzip-like file format or > >> wrapper that supports LZMA. > > > > ?? Have you looked at this code? Yes, there is: > there is an "LZMA > > compressed file format" and the 7z file format, > both of which support > > LZMA. The former format has been widely adopted by > people who distribute > > lzma-compressed tarballs, especially GNU-related > projects that use the > > lzmautils port. Some projects, like GNU coreutils, no > longer distribute > > the latest versions of their software in > bzip2-compressed tarballs. > > That's interesting - I've never seen an .lzma file > "in the wild". > > But there they are: > http://ftp.gnu.org/gnu/coreutils/ > > [ ] coreutils-6.12.tar.gz 01-Jun-2008 05:03 > 8.6M > [ ] coreutils-6.12.tar.lzma 01-Jun-2008 05:04 > 3.6M > > And there's a compressor in ports: archivers/lzma Yes, a surprising number of projects now give you the option of lzma- compressed tarballs, and have for months. When necessary, they rely on tar to preserve some of the file data you were concerned with, then compress the tarball with lzma, and bundle it in the very simple "lzma compressed file" format, which is roughly: "LZMA compressed file format --------------------------- Offset Size Description 0 1 Special LZMA properties (lc,lp, pb in encoded form) 1 4 Dictionary size (little endian) 5 8 Uncompressed size (little endian). -1 means unknown size 13 Compressed data" as described in the documentation. In the end you obtain compression ratios better than or equal to bzip2 in almost all cases ( usually substantially better), and decompression speeds closer to that of gzip. Compression speed is comparable to, but usually slightly slower than, bzip2. archivers/lzma was the first widely-used implementation, but GNU-inspired projects usually recommend the compatible archivers/lzmautils fork. The benefits can clearly be seen when you compare the size of lzma-compressed tarballs to those using gzip and bzip2. You can see more examples at many of the GNU projects, Graphicsmagick and Imagemagick, etc. -- and many of these are using lzma compression with suboptimal settings. The other night I archived a subversion repository of gentoo portage in a 5.5Mb file by using bsdtar and archivers/lzma. This repository is normally about 420Mb in size, and gentoo's lzma-compressed snapshot tarballs are 29Mb in size, so not all implementations are equal. Not so long ago (the end of April, this year) someone tried to switch ImageMagick to using lzma-compressed tarballs, and caught a lot of flak from others who were unfamiliar with this form of compression. If Tim could integrate it with libarchive, I'm sure that it would be more favorably received. Among the other high-end compression methods, ppmd has attained a stability that would merit support in libarchive, but many of the others are still evolving, or in their present form are too computationally intensive, for diminishing returns, on any but the newest hardware. Regards, b.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?48704.32247.qm>