From owner-freebsd-hackers Tue Sep 12 15:30:25 2000 Delivered-To: freebsd-hackers@freebsd.org Received: from winston.osd.bsdi.com (winston.osd.bsdi.com [204.216.27.229]) by hub.freebsd.org (Postfix) with ESMTP id 8F4CC37B42C; Tue, 12 Sep 2000 15:29:51 -0700 (PDT) Received: (from jkh@localhost) by winston.osd.bsdi.com (8.11.0/8.9.3) id e8CMTmV12787; Tue, 12 Sep 2000 15:29:48 -0700 (PDT) (envelope-from jkh@winston.osd.bsdi.com) Date: Tue, 12 Sep 2000 15:29:48 -0700 (PDT) Message-Id: <200009122229.e8CMTmV12787@winston.osd.bsdi.com> From: Jordan Hubbard To: hackers@freebsd.org Subject: Installation and package tools document, version 1.0 Sender: owner-freebsd-hackers@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG Without a lot of preamble, let me just say that all that talk of FreeBSD needing a more active specifications and management process finally got me motivated into writing all this down. This being version 1.0 of this document, I also expect it to go through multiple versions as I get feedback on it, so please consider it merely the start of an ongoing effort to write down all these installation and packaging thoughts which have been rattling around my head these past 6 or so years. See the Preface for more information, and thanks in advance for being willing to read through a 5300 word document. :-) - Jordan Title: FreeBSD installation and package tools, past, present and future Date: September 8th, 2000 Author: Jordan K. Hubbard Version: 1.0 Abstract: This document discusses FreeBSD's installation, configuration and package management tools from the perspective of where they are and where I think they need to go. Contents -------- 1. Preface 2. History and current limitations 2.1 The package tools 2.2 Sysinstall 3. The Future 3.1 FreeBSD's distribution format 3.2 User Interface 3.3 Security 3.4 Configuration and version control 3.5 Installation scripting 4. Appendix: Current efforts 4.1 libh 4.2 lizard 1. Preface ---------- There has been a lot of discussion throughout FreeBSD's history as to just what purpose sysinstall and the pkg_install suite were intended to achieve, what their shortcomings are and how we might move forward with a design document which breaks the various challenges into more manageable pieces which might be implemented by a number of different teams. It's long been my desire to sit down and do exactly that, a lack of time being my only excuse for not having done so long ago. I'm also of the understanding that a new "open packages" effort was recently started by some of the people at Daemon News, a project with parallels to some of the existing efforts to get all the various open source projects to standardize on existing package formats like RPM, Debian packages, etc., and a good excuse for me to finally do this. I'm certainly all in favor of a standardization effort based around some viable and practical second-generation technology and can only hope that producing this document will in some way aid the design of a next-generation package and installation system. Should such an effort ultimately prove itself attractive to a large segment of the open source community then all the better, but we have to start somewhere and that somewhere, for me at least, is FreeBSD. The existing package systems (RPM, Deb, *BSD) all suffer from being first-generation efforts and, while quite mature, do not address a number of significant issues which I'll cover in this document. I'll also document some of the design decisions which went into FreeBSD's current system, hopefully explaining some of the [mis]features which have confused newcomers to FreeBSD or caused them to wonder just why things were not done differently. 2. History and current limitations ---------------------------------- 2.1 The package tools --------------------- The FreeBSD package tools, located in /usr/src/usr.sbin/pkg_install, were written in August of 1993 in response to several requirements that we had at the time. Most significantly, it was not possible to easily track "extra software" that one might add to the system and conceivably wish to easily remove again, nor was it easy to see which versions of software had been installed on a given system for easier troubleshooting. Finally, any specialized installation procedures for a given piece of software essentially had to be done manually by reading the README file (when available) accompanying the binary distribution tarball, assuming of course that anything other than sources which you needed to build yourself were available. After looking at the problem for awhile, I decided that the quickest and easiest solution would be to simply add a little extra "meta-data" to these existing binary tarballs, something which could then be executed and recorded for future reference by a package adding utility. Thus were born the pkg_install utilities we have today. At the time, system administrators were also very mistrustful of pre-built binary distributions of software (not that many would actually read source code before building and installing binaries from it, but that's another story) so that's why I decided to use an existing archive format, namely gzipped tar files. This approach allowed paranoid admins to easily extract a package manually and inspect it, it also allowing me to leverage our existing tools relatively easily (though one feature, --fast-read, did need to be added to tar so that individual items could be extracted more quickly). There were and are problems with this approach, however, the most significant being that tar files (especially gzipped ones) are NOT very amenable to random-access. The directory structure of a tarfile is distributed, e.g. the file data is interleaved with the directory meta-data and, in order to get to a given item in a tarball, pkg_add(1) needs to read serially through the whole thing looking for it. This can be an especially big problem when all it has to work with is a file handle and not an actual file, something which is the case when a package is coming directly from an FTP server or some other data source which offers only serial access to the bits. pkg_add "solves" this problem by first finding sufficient temporary space on one of the available file systems and then unpacking the tarball to be extracted into a scratch directory. After the tarball is extracted, pkg_add then reads through the "packing list" (one of the meta-data files) and follow its instructions to move only those parts of the unpacked tarball into place which are needed, thus skipping the meta-data files and any others which might be optional and not actually requested by the user. During this process, it is then possible to run any custom installation scripts the package might have provided to ask the user configuration questions, do special permissions/conflict checks, and run through the package's list of dependencies on other packages to see if they should be somehow fetched and installed as well. All in all, it's a very general purpose and open-ended mechanism which many packages have used to good effect, but the temporary directory requirement would also turn around to bite me firmly on the ass when it came time to write sysinstall, which followed in April of 1995. 2.2 Sysinstall -------------- Sysinstall, located in /usr/src/release/sysinstall, was FreeBSD's first attempt at doing something more elegant and user-friendly than a simple shell script-based installation which merely asked questions in a fixed order and gave the user little opportunity to do different types of installation and configuration. The "first draft" of sysinstall was actually meant to be little more than a prototype of the installer I really wanted to write, especially from the user interface perspective since it used something called dialog(3). The dialog library began its life as a monolithic utility for writing semi-graphical shell scripts and was pressed, with great reluctance, into the duty of functioning as an interface library for C programmers. At the time, this seemed the easiest course of action given that I wasn't overly keen on writing a new set of interface components in curses(3) and the dialog library provided some fairly colorful canned dialogs which looked, at least for the time, reasonably visually impressive. In retrospect, this was also one of my biggest mistakes given that dialog(3) is also extremely limited in the user-friendliness department and lacks features like the ability to put more than 2 buttons into a dialog or a Yes/No dialog which had a selectable default (e.g. No). The inability to put a "Back" button into various dialogs which could really use one or the necessity for asking only "positive" questions are outgrowths of those limitations and good examples of how an insufficiently powerful UI library can drive the utility-writer in undesirable but unavoidable directions. The dialog library also features checkbox/radio menus which use the spacebar and enter keys very, erm, creatively to essentially confuse the heck of out users who don't pay too much attention to the Usage instructions at the beginning and simply impulsively hit Enter through the whole installation. Earlier versions of the library also completely lacked the idea of call-backs, so any form of real "dynamism" in a menu or dialog was pretty much out of the question. The things I had to do to this library in order to provide those features were so hideous that I'll probably go to a special programmer's hell when I die and be forced to do AI programming in RPG-II, or something, it also souring me on the idea of extending dialog(3) to the point where it might have actually made sysinstall less pathological in its interface behavior. The user interface library has also turned out to be not the least of sysinstall's design shortcomings. Since it was, at least in my mind, a prototype, there wasn't a lot of attention put into the area of flexibility. I provided for things like "Expert" and "Novice" (now less-insultingly named "Standard") installs, but I didn't really do much for people who wished to build many machines in a more assembly-line fashion or allow the user to save their answers to its questions for later "replay" into another installation session. Extending sysinstall also requires a knowledge of C programming (and the willingness to hack on a prototype) in order to customize it for other purposes, say a university environment where special course-ware might be part of the FreeBSD installation at the beginning of each semester. It's nowhere near as easy as it should be and many have been impaled on sysinstall in their efforts to customize FreeBSD for their unique needs. An even more significant issue with sysinstall and FreeBSD's release methodology in general is the distribution format of FreeBSD itself and sysinstall's handling of packages, especially interactive ones. FreeBSD's release methodology has really not changed all that much in the last 8 years, the basic distribution format still being largely influenced by the size of a 3.5" floppy. Each chunk of a FreeBSD distribution, e.g. the "bin" or "manpages" distributions, is nothing more than one big gzipped tarball which has been split into 240K chunks which can conveniently fit on floppies, 5 to a 5.25" floppy or 6 to a 3.5" one. Back in 1992, when we first started doing this, there were a lot of people doing floppy installs and CDs were still uncommon and/or expensive. Sysinstall was therefore designed to take a lot of the hair out of the process by automagically gluing these 240K chunks together as they came along, from whatever distribution medium was available, and feeding them to a background tar process which would simply extract them verbatim into a directory (usually, but not always, /). There are lots of problems with this, one being the fact that since a "distribution" is nothing more than a gzipped tarball split into pieces, there is none of the nifty meta-data which packages provide to say what has been installed, what dependencies it has, or any hooks for providing post-installation configuration opportunities. Even component size information is a mystery, making sysinstall unable to predict when you've chosen more distribution data than will fit on a given filesystem, leading to occasionally unpleasant surprises during installation when something fills up and simply exlodes in a messy and unhelpful fashion. A bigger problem is the fuzzy and entirely undesirable dividing line between packages and distributions. What should be a distribution and what should be a package? Where does the ``base distribution'' stop and the ports/packages collection begin? How should one upgrade the respective bits? Erasing this line of demarcation has proven to be one of the more annoying challenges in FreeBSD's release engineering process and I'll explain how and why later in this document. Finally, sysinstall simply represents a conglomeration of too many tasks. It partitions your disk(s), it loads software, it asks you questions about your network interfaces, it sets up your ppp connection, etc etc. It just tries to do too much in one place and that's a violation of the Unix Philosophy, where each component should do one easily recognizable task and no more than that, more complex tasks being achieved by putting such tools together. What we currently think of as sysinstall should essentially do nothing more than partition your disks and get a much fancier second-stage "configurator" onto the root partition before rebooting. At that stage, the configurator can give the user the option of adding the other disks and chosing what kinds of software to put on them. The scope of the configurator should be such that it becomes a general-purpose setup tool which can be used to manage all the hardware and software in the system on an ongoing basis, not simply run once and forgotten. 3. The Future ------------- 3.1 FreeBSD's distribution format --------------------------------- As I mentioned in the history section, one of the more annoying problems with FreeBSD's current distribution format is the dividing line between distributions and packages. There should really only be one type of "distribution format" and, of course, it should be the package (There Can Be Only One). Achieving this means we're first going to have to grapple with several problems, however: First, eliminating the distribution format means either teaching the package tools how to deal with a split archive format (they currently do not) or divorcing ourselves forever from floppies as a distribution medium. This is an issue which would seem an easy one to decide but invariably becomes Highly Religious(tm) every time it's brought up. In some dark corner of the world, there always seems to be somebody still installing FreeBSD via floppies and even some of the fortune 500 folks can cite FreeBSD success stories where they resurrected some old 386 box (with only a floppy drive and no networking/CD/...) and turned it into the star of the office/saved the company/etc etc. That's not to say we can't still bite that particular bullet, just that it's not a decision which will go down easily with everyone and should be well thought-out. Second, there's the issue of packages currently requiring temporary space as part of their extraction method. If we're going to have things like "bin" be a package, even if we split it up into subcomponents and make "bin" simply a package which contains a list of dependencies and nothing more (which is desirable), there are still going to be pieces which are non-extractable under the current scheme because the available disk space is too small to contain both the temporary copy and the final installed copy, which may not be on the same file system can cannot be simply moved into place. Since we'd also like to retain the ability to extract a package directly over a network connection and never have the temporary bits "hit the disk", this means that we're almost certainly going to have to go to a different archival format. Fortunately, there are some existing formats to choose from which have a lot of the required features so we won't have to reinvent the wheel and come up with our own (yuck). My current favorite is the Zip archive format. Zip is a popular archival format which gives us a wide variety of existing tools for creating, fixing and inspecting zip files. The directory is also at the very beginning so we can quickly read it in and figure out where in the data stream/file we need to go to get a specific item. Since the "configurator" stage of the installation will also be running after we've acquired a root partition and some swap space, it's also not inconceivable that we could buffer bits read over a network connection in memory so that even "seeking" out to the end of an archive file read from an FTP server socket would still allow us to move backwards in the archive for other contents. The zip file format also allows for per-archive and per-file "comment" fields which can be used to store things like MD5 checksums, pgp signatures and all sorts of other potentially useful types of meta-data. I'm not wedded to the zip file format, I simply find its combination of good compression and random-access (without having to decompress the entire archive) to be especially attractive for what we need to do. Finally, there's the issue of user interaction. The bulk of sysinstall's hard-coded features do things like make user queries which could just as easily be part of a package's install-time configuration script. Sysinstall, for example, allows you to specify which daemons will run at startup time even though this is only pertinent to the "bin" package which actually contains those daemons. Similarly, there have been security-related questions pertaining to the cryptography distributions which, even though the US crypto export and RSA issues have now been largely dealt with, may still be pertinent in other countries. Clearly, such interaction should be part of the package installation procedure itself and sysinstall should be little more than a friendly wrapper for selecting which packages to install and running their installation procedures, and that brings us to the question of User Interface. 3.2 User Interface ------------------ As noted in the History section, one of the biggest problems with sysinstall is its user interface which could only be charitably described as Evil Incarnate. The dialog(3) interface library, as I've already described, is insufficiently powerful to give the user a flexible and intuitive installation experience nor it does not take any real advantage of environments like the X Window System, should the user be running a configurator under such an environment. The package system also suffers significantly in the UI area since the pkg_add(1) utility has no idea as to whether it's running at the end of a pipe, as it currently does under sysinstall, or if it's got a real live user at the other end who's invoked it interactively from a shell. This leads to real problems when a package suddenly decides it wants to talk to the user but is being run via a front-end which will react adversely (or not at all) to the sudden appearance of the package's own interaction dialogs. This is not just a hypothetical situation but one which can, and currently does, happen whenever sysinstall's packages menu invokes a package which is interactive. The user dialogs all go to the 2nd VTY and leave the actual user somewhat mystified as to why the package installation has mysteriously "hung" on them as it waits for user input which never arrives. To effectively solve this problem, what is needed is a flexible (e.g. containing more basic "widgets" than canned dialogs) and generic UI library which provides front-end utilities like sysinstall and pkg_add with the ability to play traffic cop and direct all user interaction through a common interface. That might be something CUI based, like TurboVision (my current CUI favorite) or GUI based, like Qt/gtk, when running under X. It might even be something which talks to a Java-enabled web browser at some point in the future - we really can't predict all the conceivable UI scenarios. The package system would call into this library whenever it wanted to talk to the user, thus sharing the screen/display non-competitively with whatever utility invoked it. It would be up to the outermost "caller" (be it pkg_add or sysinstall) to decide at initialization time just what kind of back-end UI to instantiate for the generic UI. Such an approach would allow us to write all of our configuration utilities and scripts in a UI-neutral fashion which allows us to take advantage of new UI technologies as they come along without having to go back and re-write all of those painstakingly crafted user dialogs. That's basically where 99% of all the work of crafting such user interfaces goes, and we certainly don't want to have to write two different interface definitions for CUI (serial console / remote installer) and GUI (X Desktop) based users. There are some operating systems (that I won't mention) which sort of get away with this today, but FreeBSD has always been a strongly server-centric operating system and that means we really can't have a highly desktop-centric installer, we have to support the idea of installation on machines without graphics cards at all or even in situations where the user is visually handicapped and wishes to have a customized installer who's "interface" is a voice synthesizer. All of this is possible when the UI library you write directly to makes no assumptions at all about what the ultimate rendering model is going to be, it simply thinks in terms of objects like "buttons" and "choice lists", leaving it up to the back-end layer to ultimately render the appropriate UI objects somehow. 3.3 Security ------------ A major failing of most package systems, ours included, is that a package's installation and configuration scripts can essentially be any type of executable at all. While this does allow the package writer a great deal of flexibility in providing for a package's needs, and there are packages which do have highly specialized requirements, it also has a huge potential effect on security. Most packages are installed as root for a variety of reasons, some legitimate and some not, and the overall effect is that security is essentially an "opt-in" process for whomever creates or installs a package. A package which is installed as root is a package which can be either intentionally or unintentionally lethal to a user's system, even a pgp-signed and triple-authenticated package being capable of completely destroying a user's system, and it's not hard to see how. Consider what might happen if an otherwise perfectly respectable package author, overly caffeinated and partially delirious at 4am, were to write: ``rm -rf /${MYTMPDIR}'' into a package's installation script as part of its clean-up procedure. Let's also say that this removal operation is inside a failure-case check in the installation script and the author doesn't hit that case during their testing since they happen to drive the installation successfully each time. Let's finally say that the actual name of the variable in question is "MYTEMPDIR" and the author, in a state of 4am dyslexia, does not spot this mistake. You get the idea. Even if the package is pgp signed and the package author is your personal, trusted friend, you're still going to be wondering at all the sudden extra disk activity right after bombing out of his package's installation script and none of the conventional security practices have saved you from his mistake. The author is most embarrassed, your system is most toast, and you can both chalk it up to another annoying conjunction of human and infra-structural stupidity. Clearly, it would be desirable for a package which genuinely and truly needs to be root to do so in a manner which is in any way safer than it is now. One method I'm in favor of is to change a package's customization script(s) from being any arbitrary executable to being a very specific executable, namely a set of instructions in some tightly constrained scripting language. My personal favorite is Secure TCL, a useful outgrowth of the enhancements done to TCL when it got stuffed into a web browser and suddenly needed to worry a lot more about security issues. Secure TCL allows us to create highly restricted TCL environments which can be selectively "tightened" according to an administrator's own level of paranoia, allowing them to have a highly customizable and final say over what level of capability will be given to any package they install. Thus it would be possible, just to give an example, to restrict the ``file-access'' primitive to only returning a positive "It's OK to access this" indication for file names who's paths match "/etc/.*", "/usr/local/.*" or "/usr/X11R6/.*". The ``file-create'', ``file-write'' and ``file-remove'' primitives could, in turn, always validate their arguments against ``file-access'' before proceeding. With a properly designed set of primitives, it would be thus possible to evolve mechanisms for "practical security", where potentially foot-shooting primitives can either be disallowed entirely, allowed to proceed only upon user confirmation or go completely unhindered, all according to the administrator's wishes. With a little time, such package security tweaks would also begin to float around and come into the reach of less skilled administrators, just as standardized cisco access-lists for fire-walling are passed around today. It need not be TCL that is chosen for this purpose, naturally, it's simply my personal preference since I happen to already know and have working experience with TCL. A language like Python or Ruby is also probably capable of doing the job just as well, it only being necessary for the interpreted language of choice to have some sort of reasonable security model and a comparatively small footprint. I stipulate that the footprint needs to be small because any future system configurator and package infrastructure will need to be wrapped together to some extent, the resulting product being something we may wish to bootstrap off of comparatively small media. A properly written package management system will be an indispensable piece of the installation process given that the pieces of the operating system will, of course, be packages. 3.4 Configuration and version control ------------------------------------- Ultimately, installing the "OS networking package" or the "Apache Server" package should be part of a seamless, "one piece", installation experience with a common and consistent UI. The ability to leave "configurators" for each subsystem or tool behind should also be an integral part of the process, these later being runnable from a single front-end tool (let's call it ``setup'') which offers a properly organized menu/folder hierarchy for all the available tool configurators to drop themselves into. None of this is rocket science and folks like Microsoft and Apple have been doing it for ages with their operating systems. It's a workable model and, perhaps more importantly, it's now the most familiar model. Another nice thing about having a package install itself through a carefully controlled scripting language is that each mutagenic operation (say, a file overlay) can store "undo" information for itself if given enough available disk space. Also imagine that all of the undo information for a given package, throughout its lineage, goes onto an "undo stack" for that package. If necessary, the package can thus be "popped" back through its previous versions to test and see where and if a given problem (which may be noticed only months after the last upgrade) first appeared. Since the changes would be stored as deltas, files which do not change would also appear only once and no space wasted in representing multiple redundant copies of those pieces of a package which don't change from version to version (like the docs :-). Making such a mechanism part of the basic infrastructure may strike some as an over-kill proposal, but I would also submit that the problem of upgrading packages and of having multiple active versions of a single package (like gtk or TCL) are significant issues which have received rather ad-hoc attention to date. With the creative and automated use of symlinks and some filename hashing, I think we could come up with a mechanism which does for package version control what CVS does for software version control (though hopefully even less painfully :). A genuine database of some sort containing package version meta-data is also a requirement since, on a fully tricked-out system, many hundreds (if not thousands) of files might eventually be involved and keeping track of various their inter-relationships is not something you'd generally want to do with simplistic file structures (like /var/db/pkg) which require a lot of time to search and index. 3.5 Installation scripting -------------------------- Another subject I touched on earlier was the need for automated and/or highly customized installations since the needs of everyone installing FreeBSD aren't exactly identical. Given access to a nice generic UI library, as described in section 3.2, and a powerful scripting language, as described in section 3.3, we could make what people currently regard as sysinstall a purely script-driven affair. This will obviously make customization a lot easier since all anyone needs is a text-editor and a document of available primitives (which many would probably choose to learn simply by looking at the example installation anyway) in order to create a customized install and/or add their own questions to an existing package configurator. I also doubt that most people would need to be able to do this, but for those very few that do, such flexibility can and will make the difference between getting FreeBSD into some highly customized environments or simply not making the grade. 4. Appendix: Current efforts ---------------------------- 4.1. libh --------- The libh project is something I started over a year ago, with input from Mike Smith and the paid services of a Russian contract programmer named Eugene, to fulfill many of the goals expressed in this document. Unfortunately, managing a project of this complexity with a contractor many thousands of miles away and a personal schedule which allowed for very little interaction with him didn't prove to be a workable scenario and work was stopped while partially in progress. Since that time, work on it has been taken over by Alexander Langer and a small group of volunteers. A mailing list, freebsd-libh, can also be subscribed to via majordomo@freebsd.org, and the sources checked out via ``:pserver:anonymous@usw4.freebsd.org:/home/libh/cvs'' using anoncvs. The name ``libh'' is also something of a mystery to everyone but it nonetheless stuck as a working title. It probably needs to be renamed to something sexier before this project can really succeed. :-) Roughly speaking, libh currently contains: A first cut at the generic UI library, as described in section 3.2, with back-end renderers for TurboVision and Qt currently being provided. The generic UI API it provides is available for C, C++ and TCL. A complete zip file-access library written for C, C++ and TCL as described in section 3.1. Much of the security infrastructure described in section 3.2 is also implemented, with enough currently done to make possible a prototype package creation/extraction system with some test packages available (and used as part of the regression-test suite). The package information database is also written, with APIs for C, C++ and TCL. It provides for package conflict, upgrade and outdate checking. While libh does contain a lot of the code we might ultimately use, it should nonetheless be considered only one possible starting point for implementing what I've described in this document. I certainly would be happy to see the time and investment in libh ultimately go to good use, of course, but I also wouldn't want it to stand in the way of any larger and more successful effort which chose a different scripting language or UI design, for example. 4.2 lizard ---------- Lizard is the installer currently bundled, albeit in highly modified form, with Caldera's OpenLinux distribution and made freely available in some of its earlier incarnations from ftp.caldera.com. It has been suggested that a "Desktop version" of FreeBSD could be created using this technology as a stop-gap measure until libh or some similar project succeeded in solving the more complex set of issues I've outlined, that perhaps buying us a bit more time to "do things right" (in my highly prejudicial opinion :). As far as I'm aware from my limited reading of the code, lizard is only applicable to graphical installations and does not make allowances for people installing via a serial console, hence its applicability to just a desktop-oriented product. Still, it might be worth looking at by people who's interests lie solely in that direction. Customization from the highly linux-centric environment lizard currently assumes is, of course, something else which would need to be grappled with as part of such an effort. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-hackers" in the body of the message