From owner-freebsd-hackers  Thu May 16 16:19:50 1996
Return-Path: owner-hackers
Received: (from root@localhost)
          by freefall.freebsd.org (8.7.3/8.7.3) id QAA14667
          for hackers-outgoing; Thu, 16 May 1996 16:19:50 -0700 (PDT)
Received: from DATAPLEX.NET (SHARK.DATAPLEX.NET [199.183.109.241])
          by freefall.freebsd.org (8.7.3/8.7.3) with ESMTP id QAA14662
          for <hackers@FreeBSD.ORG>; Thu, 16 May 1996 16:19:46 -0700 (PDT)
Received: from 199.183.109.242 by DATAPLEX.NET
 with SMTP (MailShare 1.0fc5); Thu, 16 May 1996 18:15:59 -0600
Message-ID: <n1379851927.76789@Richard Wackerbarth>
Date: 16 May 1996 18:15:46 -0500
From: "Richard Wackerbarth" <rkw@dataplex.net>
Subject: Standard Shipping Containers - A Proposal for Distributing FreeBSD
To: "FreeBSD Hackers" <hackers@FreeBSD.ORG>
Cc: "FreeBSD Current" <freebsd-current@FreeBSD.ORG>,
        "freebsd-stable@freebsd.org" <freebsd-stable@FreeBSD.ORG>
X-Mailer: Mail*Link PT/Internet 1.6.0
Sender: owner-hackers@FreeBSD.ORG
X-Loop: FreeBSD.org
Precedence: bulk

I see what appears to me to be a problem in the distribution of FreeBSD
sources. I also propose a solution. I welcome your discussion.

Richard

The Problem:
  There are too many different variations of the same basic information. 

The Product:
   There are, and logically should be, four different "product lines". At the
moment, they are 2.1, 2.2, "current", and "cvs". Each has its purpose and I
don't intend to comment on that except to note that the similarities in the
first three exceed their differences.

The Distribution:
   There are seven distribution channels upon which I will comment.
   1) Direct access to the master tree. This really applies only
      to the cvs tree and is "the only way to go" for commiters
      who are well connected.
   2) Using "mirror".
   3) Using "mirror" with directory listing cached on the server.
   4) Using "sup".
   5) Using "ctm".
   6) Using distribution tarballs.
   7) Using the "live file system" from CD.
   
Characteristics of the Distribution Mechanisms.
   a) Only (1) and (2) provide "up to the minute" copies. All the
      rest give only a snapshot at server defined intervals.
      However, they exert an extremely heavy load on the server.
      The remainder compromise (in a reasonable mannner) by reusing
      the tree scan for multiple users at the expense of a delay 
      in the update.
   b) (3) and (4) are functionally similar
   c) (1) thru (5) offer incremental updates.

The Specific Difficulty.
   Each distribution mechanism has its own way of getting started.
   If I start with a clean disk, I must obtain a very large (28M
   compressed for the whole source) "update" to get started. In
   general, I cannot use the results of another distribution in
   place of a large portion of that transfer.
   CTM is perhaps better in that with it, we can create an update
   to transform one tree into another. However, it is significant
   work to attempt to identify and create the transformations from
   multiple starting points.
   
The Proposal.
   Since all the reasonable distribution mechanisms are based upon
   server initiated snapshots, I suggest that, for each product,
   we do the following:
   1) Have a single mechanism to define the snapshots that will
      be delivered. Then assure that everyone delivers exactly
      the same "product".
   2) Include with that distribution the identifier(s) which would
      allow a user to use that distribution as a starting place
      for another distribution method. (In the case of CTM, this
      would mean the .ctm_status file.)
      
Suggested details.
   1) Since we are running CTM for each of the products, I would
      start by having the CTM servers define the snapshots. The 
      .ctm_status file would then become a part of the source tree
      and everyone would distribute it. In particular, it would 
      get included on the sup servers, in the distribution tarballs,
      and on the live file system CD. This would allow anyone who
      has a copy of the tree from any of these sources to update it
      by applying the ctm files.
   2) I would also make available the directory of sup update keys.
      Although the one on the CD should match that distribution,
      they do not have to be maintained totally up-to-date. If you
      use a slightly out-of-date version, sup will simply replace
      a few additional files.
   3) In order to coordinate these events, the sup servers would
      trigger their updates on the basis of the receipt of a ctm
      update.
   4) In preparing a CD-ROM, we need to either
         a) freeze the source tree far enough in advance of the
            release to allow the updates to make the update circuit,
            or,
         b) freeze the update circuit and anticipate the effect of
            the final update or,
         c) use a combination of the two. Freeze the ctm updates
            before the fact. Allow the sup update to propogate.
            for inclusion on the CD. Anticipate the ctm update by
            adding one to the last count propogated if there were
            any changes. After the CD is frozen, use it to generate
            the next input to the ctm sequence.
            
 Conclusions:
    1) Such a methodology will assure that it is easy for any user
       to jump from a CD or tarball to ctm or sup without having
       to re-aquire the bulk of the sources.
    2) Sup can be used to repair a damaged tree when a complete
       ctm sequence is not available locally.
    3) Ctm can be used for routine updates to avoid transferring
       the entire file to realize a minor change.
    4) We need to enhance ctm to allow it to recognize intentionally
       pruned trees and ignore that portion of the update. (The
       argument for this conclusion was not included in this
       document)
       
 
      

--

...computers in the future may have only 1,000 vacuum tubes and weigh
only 1/2 tons.      --  Popular Mechanics, March 1949