Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 30 Oct 2000 17:01:30 -0800
From:      Tim Kientzle <kientzle@acm.org>
To:        "Daniel C. Sobral" <dcs@newsguy.com>, Alexander Langer <alex@big.endian.de>, libh@FreeBSD.ORG, "Jordan K. Hubbard" <jkh@FreeBSD.ORG>
Subject:   Re: BOF at BSDCon: FreeBSD Installer, Packages System
Message-ID:  <39FE19EA.5346F798@acm.org>
References:  <39DCC860.B04F7D50@acm.org> <20001006155542.A29218@cichlids.cichlids.com> <39F3CDD7.15B889E7@acm.org> <20001023190412.B507@cichlids.cichlids.com> <39F47E98.4BB647AA@acm.org> <20001023202244.B10374@cichlids.cichlids.com> <39F48F4A.38D458C2@acm.org> <39FCF244.5A8C8E59@newsguy.com> <39FDC12E.304B0011@acm.org> <39FDE2A0.C2CEF041@acm.org>

next in thread | previous in thread | raw e-mail | index | archive | help
Hmmm..  As I suspected, if you first
  gunzip -r /usr/share/man
then a tar.gz archive is only 9MB.  That suggests a couple
of ways to save space in the distribution archives.
One, obviously, is to store the data un-gzipped and, after
unpacking, go back through and gzip appropriate files.
(This is tricky with the man tree because of multiply-linked
files.  Best is to auto-build a shell script that gzips files
and creates links; then you can build a very
compact archive with just one copy of each man file.)

Another approach is to build a custom archive format that permits
you to store the actual file data un-gzipped but mark the entry
so that the de-archiver will re-gzip the data as it's written.
Sounds roundabout, I know, but if you think carefully
about how gzip works internally, you'll understand why this
generally gives better compression.  It's similar to HTTP
"transfer-encoding", if you want to think of it that way.

A custom archive format is no big deal; I have a favorite one
I've used for a couple of years now that's extremely easy to
implement, extensible, etc.  It discards tar's tape-centric
heritage and in the process discards most of tar's limitations.

				- Tim

Tim Kientzle wrote:
> 
> Tim Kientzle wrote:
> > Though I haven't tested it, I wouldn't be surprised if
> > the ports tree was more than twice as large in ZIP format as
> > in tar.gz format.
> 
> I just did a few quick tests against my FreeBSD 3.3
> system to see how much you lose by switching from
> tar.gz to ZIP.  I simply archived a couple of directories
> and compared the sizes:
> 
> Directory           tar.gz         ZIP
> /usr/ports       7,601,675  15,008,530
> /usr/src        50,896,742  62,536,891
> /usr/bin         3,892,391   6,192,116
> /usr/share/man  28,449,979  22,518,970 (!)
> 
> I think it's pretty clear that building a single
> archive and then compressing the whole thing is
> necessary if you really want to build full-featured
> CD-ROM distributions.
> 
>                         - Tim Kientzle
> 
> P.S.  /usr/share/man is an interesting example
> which works out larger in tar.gz format because the
> individual files are already gzipped.  I suspect that
> you could get an archive smaller than 22MB by un-gzipping
> all the individual files and then building a tar.gz archive.
> 
> To Unsubscribe: send mail to majordomo@FreeBSD.org
> with "unsubscribe freebsd-libh" in the body of the message


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-libh" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?39FE19EA.5346F798>