From owner-freebsd-arch Tue Jul 9 15:38:20 2002 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.FreeBSD.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id DE90C37B400 for ; Tue, 9 Jul 2002 15:38:17 -0700 (PDT) Received: from hawk.mail.pas.earthlink.net (hawk.mail.pas.earthlink.net [207.217.120.22]) by mx1.FreeBSD.org (Postfix) with ESMTP id 0C74D43E4A for ; Tue, 9 Jul 2002 15:38:17 -0700 (PDT) (envelope-from tlambert2@mindspring.com) Received: from pool0275.cvx22-bradley.dialup.earthlink.net ([209.179.199.20] helo=mindspring.com) by hawk.mail.pas.earthlink.net with esmtp (Exim 3.33 #1) id 17S3cX-0001Wv-00; Tue, 09 Jul 2002 18:38:10 -0400 Message-ID: <3D2B65A3.ABB92114@mindspring.com> Date: Tue, 09 Jul 2002 15:37:23 -0700 From: Terry Lambert X-Mailer: Mozilla 4.7 [en]C-CCK-MCD {Sony} (Win98; U) X-Accept-Language: en MIME-Version: 1.0 To: Artem Tepponen Cc: Garrett Wollman , Mark Valentine , arch@freebsd.org Subject: Re: Package system flaws? References: <5235EF9BAE6B7F4CB3735789EEF73B29074172@turtle.egar.egartech.com> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG Artem Tepponen wrote: > No, Terry. Dictionary locality works in a different way. > gzipped tar will almost always win vs. tarred gzipped files. > 10-15% from memory. Just a quick check: > > -rw-r--r-- 1 temik develops 29020160 Jul 9 12:36 gcc-3.0.1.gz.tar > -rw-r--r-- 1 temik develops 13821669 Jul 9 12:41 gcc-3.0.1.tar.bz2 > -rw-r--r-- 1 temik develops 18054324 Sep 24 2001 gcc-3.0.1.tar.gz > -rw-r--r-- 1 temik develops 22746511 Jul 9 12:52 gcc-3.0.1.zip > > Oops. I was wrong. >35% is a big difference. And bzip adds another 24%. > But for binaries difference between gzip vs. bzip2 will be smaller. > > This is quite simple check but the picture will remain the same > for pretty any kind of data and hope that's enough to choose > single tar.somez + header. > > Will header be combined or in a different file is another question. 1) "Most compression", not "all compression". 2) LZW resets the dictionary every 12K. This is the patented process that Terry Welch of Unisys introduced. So your argument is only valid for a lot of small files who size is well under 12K, which have similar contents. 3) I believe gzip and bzip were both written to get out from under the Unisys patent, and therefore do not compress as well as they could compress, even though Unisys has granted blanket royalty free use for certain applications which fall into this category. 4) Nothing in my statement precludes maintaining the dictionary as a spanning set over a number of small files, per #2, while at the same time leaving the index uncompressed. 5) Yes, I would expect that an uncompressed index would take more room than a compressed index. 6) For most modern communications media, (including broad-band where a modulator/demodulator pair is used... e.g. cable modem) the modems involved include their own compression; usually a form of trellis encoding. As a side note: compression of compressed data is useless, and usually, in fact, counter-productive. All of the format arguments I've been making are predicated on non-CDROM distribution over some medium which is two orders of magnitude or more slower than a local CDROM... and which, by their very nature of having hardware compression, tend to not benefit at all from compression anyway. But even if your argument were totally valid, then the compression you seek is going to come from the link level compression on top of the data being transferred anyway. NB: The Unisys patent expires on Dec 10th of this year, in any case, so the only reason bzip/gzip wouldn't support using it after that is religious. -- Terry To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message