From owner-freebsd-arch Wed Jul 10 0: 2:40 2002 Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.FreeBSD.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id ADDF937B400; Wed, 10 Jul 2002 00:02:33 -0700 (PDT) Received: from swan.mail.pas.earthlink.net (swan.mail.pas.earthlink.net [207.217.120.123]) by mx1.FreeBSD.org (Postfix) with ESMTP id 1F0E843E42; Wed, 10 Jul 2002 00:02:33 -0700 (PDT) (envelope-from tlambert2@mindspring.com) Received: from pool0241.cvx22-bradley.dialup.earthlink.net ([209.179.198.241] helo=mindspring.com) by swan.mail.pas.earthlink.net with esmtp (Exim 3.33 #1) id 17SBUW-00021d-00; Wed, 10 Jul 2002 00:02:25 -0700 Message-ID: <3D2BDBD5.CA71A2C1@mindspring.com> Date: Wed, 10 Jul 2002 00:01:41 -0700 From: Terry Lambert X-Mailer: Mozilla 4.7 [en]C-CCK-MCD {Sony} (Win98; U) X-Accept-Language: en MIME-Version: 1.0 To: Jordan K Hubbard Cc: Dan Nelson , Archie Cobbs , Dan Moschuk , Dag-Erling Smorgrav , Wes Peters , arch@FreeBSD.ORG Subject: Re: Package system flaws? References: Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-freebsd-arch@FreeBSD.ORG Precedence: bulk List-ID: List-Archive: (Web Archive) List-Help: (List Instructions) List-Subscribe: List-Unsubscribe: X-Loop: FreeBSD.ORG Jordan K Hubbard wrote: > Oh dear, why do I find myself so unable to resist this thread? :) Occupational hazard? 8-). > I'm borrowing a bit of Macintosh nomenclature there (though I'm sure > Terry will come along and correct me by pointing out that IBM was the > first to introduce "Fat binaries" with VM/CMS or something :) but I'm > sure people get the idea. If you're distributing an Emacs or TeX > package which weighs in at some hefty percentage of the New York phone > book in size, and with KDE and Gnome one doesn't even have to look to > Emacs anymore for a good example of a "really big, honkin' package", you > naturally want to save on disk space if at all possible both to minimize > load on the archives and make those poor Australian users with their > 9600 baud Telstra link to the US happy. Compression is certainly a > good start, but when you start distributing those packages for 3 or 4 > different architectures (as FreeBSD is definitely not far away from > doing) you also would like to not distribute 3 or 4 different copies of > the same architecture-neutral bits if you can possibly help it. That's > where the idea of munging attributes into the dictionary namespace > starts making more and more sense, and not just for representing > different architectures but also thinks like "experimental vs stable" > variants, "mix-ins" (like all the various versions of Apache which have > various bits of compiled-in smarts) and what have you. If you introduce > the concept of install-time attributes, some of which may be implicit > (like architecture) and some of which may be explicit (like "give me the > experimental version please"), you conceivably end up with mangled > pathnames within the package which are demangled on the way out, > C++-style. This allows you to have, say, just one copy of all the Emacs > lisp files and documentation but 3 or 4 different "bin/emacs" files > which don't collide internally and are properly selected for on > extraction. Apple has dealt with this for the 68K vs. PPC binaries by stuffing the binaries into the same package, as a unit, and putting them in different "code resource" in the resource fork of the same file, while sharing the "data fork" between the code. The historical canonical answer is "ANDF" -- Architecture Neutral Distribution Format. In ANDF, one would compute the quad tree for a compiled program (for example), but not do the code generation. Instead, one would store the quad tree, and generate the actual code, at install time. ANDF is actually a brilliant idea, since the resulting program is not source code. It also doesn't have the problem of the Apple approach, which is bloating the file size by the unused portion(s) of the code contained therein. Unfortunately, creation of intermediate languages is against the policy of the FSF, due to it's ability to weaken the effect of the GPL on programs. This is the same reason the data dictionary work that was discussed on -advocacy and -chat recently is frowned upon by RMS, and why he posted what he did about sweeping the research under the rug to avoid exposure. At this point in time, ANDF is not an option because of license politics having nothing to do with available technology. I would like to think that a pure copy of the Apple approach was, likewise, not an option. Specifically -- and it looks like this is what you are actually advocating here -- extracting the portions of the applicable binaries at installation time, rather than installing the whole thing, is probably the most correct approach. It means that you only end up with what you are worried about on your disk. It also has the (dubious, in some people's eyes) attribute of making it impossible to take an installed package and recreate the package as a distribution -- you can't go to any machine with the software installed on it, and simply plug in you iPirate or whatever, and suck the package down to ensure redistributability to any target platform supported by the original distribution. The one real drawback with this approach is linear media. That is, if I have a DVD distribution, it's not a problem. But if I have a 28K modem... it is. A CDROM distribution is mildy problematic, since putting i386, PPC, SPARC, Alpha, and IA64 (and S/360? 8-)) binaries all in the same package could require a lot of CDROMs. But that's still less of a problem then linear media, like a TCP connection stream or a magnetic tape. There are two workarounds for this... assuming you can get the metadata early, which means that it's at a known location in every file, which pretty much dictates the front of the file. The first is that the FTP protocol supports the concept of a "REGET", and does not check that the receiver has the most recent version of the file ... it only cares about the byte offset, AND it supports interruption of transfers. The second is that the HTTP 1.1 protocol support HTTP GET with a range argument for a byte range for a server object. Both of these options effectively support random acces of known locations within a file, at a known offset. Not all FTP and HTTP servers support these facilities; enough do that it might be OK to rely upon them. HTTP is particularly attractive, due to firewall issues at large companies, where other protocols are blocked. A corollary to the use of random access at a known offset from metadata gathered from a known offset is that per-file MD5 (or other cryptographic) signatures can not be applied to the file as a whole. Magnetic tape may not have to worry about this... since you are going to have to read everything to byte-count for offset based extents anyway... but other other media where the intervening bytes are not traversed *is* a problem for those of us without corporate network connectivity. Even a cable modem or DSL could be come onerous for a large package (your EMACS example would probably strain a T1 user's patience ;^)). > Anyway, wish-list items like this are why it's a good idea to define the > goals first and the package format second. :-) Yes. > P.S. I also gree with jhb's assertion that some folks really need to > take a good look at libh since it takes a number of things like this > into account, including all the "occludes, obsoletes, upgrades, ..." > attributes that people were recently demanding as package metadata. The current system uses libh. We would like to (or *I* would like to 8^)) exceed the capabilities of the current system. The thing that strikes me about libh is that there is a lot of human effort involved in the dependency tracking mechanism, if it's going to be possible to perform some of the relationship tracking that some of the posters have already requested. In particular, I think that there is not sufficient differentiation between "necessary" and "sufficient". Part of the problem there is that the ports system has always been based on the idea that most of the things in it get built by the user as needed, so no matter what, it's always "sufficient". Moving to a system that can support binary-only implies that the implicit guarantees that were there are no longer there, and you need to consider that "just rebuild" will not be an option. I don't mean to call the demands that were made incomplete or naieve, but... well, yes I *do* mean to call them naieve, or at *least* call them incomplete. 8-). Finally, I think that people often confuse design with implementation, and that just because a system is *capable* of solving a particular problem, that the initial implementation would have to be delayed until it *actually solves the problem*. I have a really big wishlish, and adding everyone's wishlist together yields a *huge* wishlist. It's clear that implementing everything isn't possible at the present time. But that doesn't mean that a design should not take this into account as potential futuere work. Even if something ends up falling on the floor (I rather expect almost *everything* will *have to*, and that the first revision will only solve one or two fundamental issues, like the "packaging the base system" problem), whatever the final design is, it needs to not *preclude* solving certain problems in the future, without having to reinvent the framework yet again. I guess we should start a seperate thread specifically for a "wishlist"; no offense to the person who volunteered to collect and summarize this, but I'd like to see the information captured without any editorializing by a single person; I think it will be more useful as raw data. -- Terry To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-arch" in the body of the message