Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 01 Nov 2000 00:09:12 -0800
From:      Tim Kientzle <kientzle@acm.org>
To:        "Daniel C. Sobral" <dcs@newsguy.com>
Cc:        Patrick Bihan-Faou <patrick@rageagainst.com>, libh@FreeBSD.ORG
Subject:   Re: BOF at BSDCon: FreeBSD Installer, Packages System
Message-ID:  <39FFCFA8.BCF5425@acm.org>
References:  <39DCC860.B04F7D50@acm.org> <20001006155542.A29218@cichlids.cichlids.com> <39F3CDD7.15B889E7@acm.org> <20001023190412.B507@cichlids.cichlids.com> <39F47E98.4BB647AA@acm.org> <20001023202244.B10374@cichlids.cichlids.com> <39F48F4A.38D458C2@acm.org> <39FCF244.5A8C8E59@newsguy.com> <39FDC12E.304B0011@acm.org> <39FE2406.150CA3B1@newsguy.com> <00cb01c042f1$1a347190$040aa8c0@local.mindstep.com> <39FE562C.714DBE7C@newsguy.com>

next in thread | previous in thread | raw e-mail | index | archive | help
"Daniel C. Sobral" wrote:
> It would surprise me. I think you are grossly overestimating tar.gz
> advantage. Anyway, the ports tree is very different from the source tree
> or the ports sources.

My earlier numbers used default compression for all tests.
Using maximum compression for both ZIP and GZip gives different
numbers, but the same conclusion:

/usr/ports
tar.gz size: 7,454,638
ZIP size:   14,947,231

If you don't trust my numbers, feel free to try it yourself:

cd <target dir> && zip -r9 - . | wc
cd <target dir> && tar -cf - . | gzip -9 | wc

The ports tree is the most extreme example, but it's a
general fact that archives of lots of relatively small
text files will compress much better with tar.gz than
with ZIP.  This is intrinsic to how the compression works.
(If you're unfamiliar with the nitty-gritty of modern
compression algorithms, Mark Nelson's "The Data Compression
Book" is a good place to start.)

I originally thought the ports distribution was unique in
this way, but it's not.   The source tree, manpages, info,
and doc distributions are also all collections of relatively
small text files.  The Perl and Emacs packages (both of which
contain large directory trees of textual source code) are
other good examples.  Any source code archive will fit this
kind of profile.  To a somewhat lesser degree, this effect
occurs in any archive of similar small files, which includes
/usr/bin, for instance.  If compression is important, tar.gz has
a big advantage over ZIP.

> Package installation can be painfully slow because of tar.gz problems.

What exactly are these problems?  I strongly suspect they are
artifacts not of the file format itself but rather of the
pkg_install program architecture.  If you're doing multiple scans
through the tar.gz archive to extract all files then, yes, that
will be slow.  The answer to this is to only do one pass.
(Admittedly, that's not always easy to arrange, but noone
said software optimization was simple. ;-)

> The temporary directory is a by-product of having to decompress
> everything, and ZIP is a solution to this problem.

Huh?  I don't follow this.  I use tar quite often to unpack directly
into a target directory.  I think what you might be saying is
that you currently unpack into a temporary directory in order
to locate the manifest, then move everything from the temp dir
according to the manifest instructions.  Is this correct?

This is why I suggested attaching the manifest separately
to the beginning of the package file.   Then, you can decide
where to put files and just unpack the stream in a single pass
directly to it's target location.

The directory-per-package model allows some further optimizations
(especially related to dependency loading when installing over a
network connection) that aren't really practical with the
current big /usr/local tarpit.  (In particular, it lets you
uncouple archiving and "installation" without requiring a temp
directory, which allows you to perform fully-streaming installs
with all dependencies without having to stop any stream.)

> > One way to be fast would be as one suggested the "one directory per package"
> > approach. It is messy to set-up correctly if you do it manually, but this is
> > precisely why tools are nice when they exist...
> 
> I fail to see what it could gain us.

Because the package directory _is_ the temporary directory.  There's
no need to unpack into a separate location first.

> The whole problem with binary packages is that they screw everyone who
> is not satisfied with the defaults.

Succinctly put.

> > > Somehow I dislike PATHs, MANPATHs and LIBPATHs with 40 or 50 entries, I
> > > dislike a /usr that won't fit in one screen, I dislike having programs
> > > all over the place, I dislike having to edit /etc every time I want to
> > > make a new program available, and I specially dislike having to instruct
> > > users in setting up their accounts to be able to use a new program .

I confess that when I first started playing with this scheme, I
didn't see the obvious solution either.  In a separate message
I'll outline the full system that I've experimented with.  It
addresses all of these issues and simplifies a lot of related
problems.

> Installing and removing, that's easily automated.
> Even checking for filename conflict at install.

Could you explain how these are "easily automated"?  I can
see a couple of ways to handle these, but I want to make sure
I understand which specific approach you have in mind.  Maybe
you see something I've overlooked.  BTW, I notice you said
"checking" for filename conflicts, and not "resolving" filename
conflicts.

> What's the point of moving to the system most difficult to automate?
> Unless, of course, you are voluntering to write all these tools before
> we make the change?

I have written a set of tools to automate the maintenance of link
directories.  They're really very simple.  Would you like to see them?

				- Tim Kientzle


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-libh" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?39FFCFA8.BCF5425>