Date: Tue, 9 Jul 2002 23:53:45 -0700 (PDT) From: Don Lewis <dl-freebsd@catspoiler.org> To: tlambert2@mindspring.com Cc: temik@egartech.com, wollman@lcs.mit.edu, mark@thuvia.demon.co.uk, arch@FreeBSD.ORG Subject: Re: Package system flaws? Message-ID: <200207100653.g6A6rjwr006212@gw.catspoiler.org> In-Reply-To: <3D2B65A3.ABB92114@mindspring.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On 9 Jul, Terry Lambert wrote: > Artem Tepponen wrote: >> No, Terry. Dictionary locality works in a different way. >> gzipped tar will almost always win vs. tarred gzipped files. >> 10-15% from memory. Just a quick check: >> >> -rw-r--r-- 1 temik develops 29020160 Jul 9 12:36 gcc-3.0.1.gz.tar >> -rw-r--r-- 1 temik develops 13821669 Jul 9 12:41 gcc-3.0.1.tar.bz2 >> -rw-r--r-- 1 temik develops 18054324 Sep 24 2001 gcc-3.0.1.tar.gz >> -rw-r--r-- 1 temik develops 22746511 Jul 9 12:52 gcc-3.0.1.zip >> >> Oops. I was wrong. >35% is a big difference. And bzip adds another 24%. >> But for binaries difference between gzip vs. bzip2 will be smaller. >> >> This is quite simple check but the picture will remain the same >> for pretty any kind of data and hope that's enough to choose >> single tar.somez + header. >> >> Will header be combined or in a different file is another question. > > 1) "Most compression", not "all compression". > > 2) LZW resets the dictionary every 12K. This is the patented > process that Terry Welch of Unisys introduced. So your > argument is only valid for a lot of small files who size > is well under 12K, which have similar contents. > > 3) I believe gzip and bzip were both written to get out from > under the Unisys patent, and therefore do not compress as > well as they could compress, even though Unisys has granted > blanket royalty free use for certain applications which fall > into this category. In the comparisons I've done between gzip(1) and compress(1), gzip has always gotten better compression that compress, though gzip runs slower. When I've tuned the compression level knob on gzip to get similar compression levels, it runs faster than compress. The algorithm.doc file in the gzip distribution seems to indicate that gzip resets its dictionary when it decides it would be advantageous to do so. I've read somewhere a long time ago that the compression results would be better if it used arithmetic encoding on its output instead of Huffman encoding, but I believe that IBM has the patent on arithmetic encoding. > NB: The Unisys patent expires on Dec 10th of this year, in any case, > so the only reason bzip/gzip wouldn't support using it after that is > religious. I was just thinking the other day that the patent expiration date should be approaching. I believe that the area most impacted by this patent in recent years is the creation of .gif files. At least that's what has gotten all the press. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200207100653.g6A6rjwr006212>