From owner-freebsd-libh Mon Oct 30 17: 2:17 2000 Delivered-To: freebsd-libh@freebsd.org Received: from oberon.dnai.com (oberon.dnai.com [207.181.194.97]) by hub.freebsd.org (Postfix) with ESMTP id 9FF5137B4FE; Mon, 30 Oct 2000 17:02:11 -0800 (PST) Received: from neptune.dnai.com (neptune.dnai.com [207.181.194.93]) by oberon.dnai.com (8.9.3/8.9.3) with ESMTP id RAA90914; Mon, 30 Oct 2000 17:02:10 -0800 (PST) Received: from acm.org (207-172-166-2.s2.tnt1.sfrn.ca.dialup.rcn.com [207.172.166.2]) by neptune.dnai.com (8.9.3/8.9.3) with ESMTP id RAA28297; Mon, 30 Oct 2000 17:02:08 -0800 (PST) Message-ID: <39FE19EA.5346F798@acm.org> Date: Mon, 30 Oct 2000 17:01:30 -0800 From: Tim Kientzle Reply-To: kientzle@acm.org X-Mailer: Mozilla 4.5 [en] (X11; I; FreeBSD 3.3-RELEASE i386) X-Accept-Language: en MIME-Version: 1.0 To: "Daniel C. Sobral" , Alexander Langer , libh@FreeBSD.ORG, "Jordan K. Hubbard" Subject: Re: BOF at BSDCon: FreeBSD Installer, Packages System References: <39DCC860.B04F7D50@acm.org> <20001006155542.A29218@cichlids.cichlids.com> <39F3CDD7.15B889E7@acm.org> <20001023190412.B507@cichlids.cichlids.com> <39F47E98.4BB647AA@acm.org> <20001023202244.B10374@cichlids.cichlids.com> <39F48F4A.38D458C2@acm.org> <39FCF244.5A8C8E59@newsguy.com> <39FDC12E.304B0011@acm.org> <39FDE2A0.C2CEF041@acm.org> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-freebsd-libh@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG Hmmm.. As I suspected, if you first gunzip -r /usr/share/man then a tar.gz archive is only 9MB. That suggests a couple of ways to save space in the distribution archives. One, obviously, is to store the data un-gzipped and, after unpacking, go back through and gzip appropriate files. (This is tricky with the man tree because of multiply-linked files. Best is to auto-build a shell script that gzips files and creates links; then you can build a very compact archive with just one copy of each man file.) Another approach is to build a custom archive format that permits you to store the actual file data un-gzipped but mark the entry so that the de-archiver will re-gzip the data as it's written. Sounds roundabout, I know, but if you think carefully about how gzip works internally, you'll understand why this generally gives better compression. It's similar to HTTP "transfer-encoding", if you want to think of it that way. A custom archive format is no big deal; I have a favorite one I've used for a couple of years now that's extremely easy to implement, extensible, etc. It discards tar's tape-centric heritage and in the process discards most of tar's limitations. - Tim Tim Kientzle wrote: > > Tim Kientzle wrote: > > Though I haven't tested it, I wouldn't be surprised if > > the ports tree was more than twice as large in ZIP format as > > in tar.gz format. > > I just did a few quick tests against my FreeBSD 3.3 > system to see how much you lose by switching from > tar.gz to ZIP. I simply archived a couple of directories > and compared the sizes: > > Directory tar.gz ZIP > /usr/ports 7,601,675 15,008,530 > /usr/src 50,896,742 62,536,891 > /usr/bin 3,892,391 6,192,116 > /usr/share/man 28,449,979 22,518,970 (!) > > I think it's pretty clear that building a single > archive and then compressing the whole thing is > necessary if you really want to build full-featured > CD-ROM distributions. > > - Tim Kientzle > > P.S. /usr/share/man is an interesting example > which works out larger in tar.gz format because the > individual files are already gzipped. I suspect that > you could get an archive smaller than 22MB by un-gzipping > all the individual files and then building a tar.gz archive. > > To Unsubscribe: send mail to majordomo@FreeBSD.org > with "unsubscribe freebsd-libh" in the body of the message To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-libh" in the body of the message