From owner-freebsd-arch@FreeBSD.ORG Thu Apr 7 07:36:17 2005 Return-Path: Delivered-To: freebsd-arch@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 26C2B16A4CE for ; Thu, 7 Apr 2005 07:36:17 +0000 (GMT) Received: from pd3mo2so.prod.shaw.ca (shawidc-mo1.cg.shawcable.net [24.71.223.10]) by mx1.FreeBSD.org (Postfix) with ESMTP id 8320E43D54 for ; Thu, 7 Apr 2005 07:36:16 +0000 (GMT) (envelope-from colin.percival@wadham.ox.ac.uk) Received: from pd5mr8so.prod.shaw.ca (pd5mr8so-qfe3.prod.shaw.ca [10.0.141.184]) by l-daemon (Sun ONE Messaging Server 6.0 HotFix 1.01 (built Mar 15 2004)) with ESMTP id <0IEK00K0LFSG6EE0@l-daemon> for freebsd-arch@freebsd.org; Thu, 07 Apr 2005 01:36:16 -0600 (MDT) Received: from pn2ml5so.prod.shaw.ca ([10.0.121.149]) by pd5mr8so.prod.shaw.ca (Sun ONE Messaging Server 6.0 HotFix 1.01 (built Mar 15 2004)) with ESMTP id <0IEK00E44FSG2MF0@pd5mr8so.prod.shaw.ca> for freebsd-arch@freebsd.org; Thu, 07 Apr 2005 01:36:16 -0600 (MDT) Received: from [192.168.0.60] (S0106006067227a4a.vc.shawcable.net [24.87.209.6]) by l-daemon (iPlanet Messaging Server 5.2 HotFix 1.18 (built Jul 28 2003)) with ESMTP id <0IEK0020VFSF6F@l-daemon> for freebsd-arch@freebsd.org; Thu, 07 Apr 2005 01:36:16 -0600 (MDT) Date: Thu, 07 Apr 2005 00:36:09 -0700 From: Colin Percival In-reply-to: <200504070644.j376imwB027984@luthien.iceflower.in-berlin.de> To: Olaf Wagner Message-id: <4254E2E9.2090504@wadham.ox.ac.uk> MIME-version: 1.0 Content-type: text/plain; charset=ISO-8859-1 Content-transfer-encoding: 7bit X-Accept-Language: en-us, en X-Enigmail-Version: 0.90.1.0 X-Enigmail-Supports: pgp-inline, pgp-mime References: <200504070644.j376imwB027984@luthien.iceflower.in-berlin.de> User-Agent: Mozilla Thunderbird 1.0.2 (X11/20050406) cc: freebsd-arch@freebsd.org Subject: Re: Adding bsdiff to the base system X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 07 Apr 2005 07:36:17 -0000 Olaf Wagner wrote: > In article <424BD4FB.1050304@wadham.ox.ac.uk> you wrote: >>At present portsnap is the only mechanism >>available by which most users can securely maintain an up-to-date copy >>of the FreeBSD ports tree; it also provides some other advantages over >>cvsup (reduced bandwidth and ports INDEX/INDEX-5/INDEX-6 files). > > Just out of interest: how does it do that? I've not tested it yet, > but what intelligence or knowledge does it use to be so much more > efficient (1/10) than CVSup? (I myself haven't found anything as > efficient as CVSup yet, at least for replicating CVS repositories...) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Exactly. CVSup is a tool for replicating CVS repositories; portsnap is a tool for checking out the latest version of all the files in the repository. CVSup is solving a very difficult problem; portsnap is solving a very simple problem -- so it's not all that surprising that portsnap can be a bit more efficient. The reason portsnap is more efficient lies in how portsnap and CVSup determine which files need to be updated. The ports tree contains roughly 71000 files, and the first thing the CVSup client does is list all of these files and send that list to the server. In contrast, portsnap has an index file -- containing, roughly speaking, that same list -- and the portsnap client merely sends the sha256 hash of this index file to the server, which responds with either "I recognize that index -- here's a patch which will turn it into the latest index" or "I don't recognize that -- here's the new index". Because these indices have no user-serviceable parts (in fact, mucking about with the files in /usr/local/portsnap at all is strongly discouraged), there is a very good chance that the portsnap server will have a useful patch. As a result, while CVSup uses (in this initial stage) bandwidth which is proportional to the number of files in the ports tree, portsnap uses bandwidth proportional to the number of files which have been modified, which is typically around 1% of the tree per day. When it comes to the actual distribution of patches to files in the tree, portsnap is also marginally more efficient than CVSup, due to differences in how they encode the patches, but the real gains come in the process of identifying which files need to be updated. Colin Percival PS. CVSup's inefficiency in dealing with large trees containing a small number of updated files isn't only relevant in the context of updating a ports tree; it is even more notable when tracking the security branches of the src tree. In the paper in which I introduced FreeBSD Update, I gave an example of where FreeBSD Update -- which distributes binary updates to the base system -- used less than half of the bandwidth needed by CVSup for the task of applying the corresponding updates to the src tree.