From owner-freebsd-hackers@FreeBSD.ORG Mon Feb 9 09:30:10 2004 Return-Path: Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 243D316A4D1 for ; Mon, 9 Feb 2004 09:30:10 -0800 (PST) Received: from ion.gank.org (ion.gank.org [69.55.238.164]) by mx1.FreeBSD.org (Postfix) with ESMTP id 1266E43D1F for ; Mon, 9 Feb 2004 09:30:10 -0800 (PST) (envelope-from craig@tobuj.gank.org) Received: from localhost (ion.gank.org [69.55.238.164]) by ion.gank.org (mail) with ESMTP id 079912B4E9 for ; Mon, 9 Feb 2004 11:30:10 -0600 (CST) Received: from ion.gank.org ([69.55.238.164]) by localhost (ion.gank.org [69.55.238.164]) (amavisd-new, port 10024) with LMTP id 40042-05-4 for ; Mon, 9 Feb 2004 11:30:09 -0600 (CST) Received: from owen1492.uf.corelab.com (pix.corelab.com [12.45.169.2]) by ion.gank.org (mail) with ESMTP id E532B2B22B for ; Mon, 9 Feb 2004 11:30:08 -0600 (CST) From: Craig Boston To: freebsd-hackers@freebsd.org Date: Mon, 9 Feb 2004 11:30:05 -0600 User-Agent: KMail/1.5.4 MIME-Version: 1.0 Content-Disposition: inline X-UID: 156 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Message-Id: <200402091130.05656.craig@tobuj.gank.org> X-Virus-Scanned: by amavisd-new at gank.org Subject: Subversion/CVS experiment summary X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 09 Feb 2004 17:30:10 -0000 This is a bit of a long email, so please skip unless you're into source code revision management :) This is an informal report on the viability of using Subversion to manage the FreeBSD source code repository. Some of this is generic and will be familiar to anyone who has looked at SVN before, some is more FreeBSD-specific. NOTE: I'm not trying to push one SCM over the other or suggest that CVS is wholly inadequate. This is merely the result of an evaluation for my personal use, and I thought I'd post it in case anyone was interested. CVS has been used by the FreeBSD project for a LONG time for good reasons. Despite its shortcomings, I suspect that it will be in use for quite a while longer. ----------------------------------------------------------------------------- Section the 1st - Motive ----------------------------------------------------------------------------- My main motivation for these tests was to bring my local modifications to FreeBSD into some semblance of order. It seems I've amassed a bit of a collection of local patches, 3rd party patches, and side projects -- some of which are mutually exclusive or apply to different branches. Simply keeping a working copy with my changes in it works fine for one project but becomes painful when there are several. I'd also like to be able to keep version history for my modifications. I've heard good things about Perforce, and its effortless merge functionality looks really slick. If I'm ever involved with a major commercial coding project, I'll definitely give it some consideration. For my "free-time" projects however it's not really an option. A couple of my local mods are in a bit of a grey area as far as the 'non-commercial' license goes, so I'd rather avoid that whole issue. ----------------------------------------------------------------------------- Section the 2nd - Setup and conversion ----------------------------------------------------------------------------- Most of my tests were performed on the src/sys portion of the repository. It seemed to be large enough that I could get a general idea of how well Subversion scales, but small enough that I wouldn't spend all week waiting for the import to complete. All tests were done on a Pentium 4 2.8 GHz system with 512MB RAM. I used a local repository on one disk and the working directories on another (for both CVS and SVN). These tests have been done over the course of the last week and a half, using subversion-0.35.1_1. I've heard of attempts to convert the repo for testing using the cvs2svn.py failing (for more details, see the thread at http://docs.freebsd.org/cgi/getmsg.cgi?fetch=640133+0+archive/2004/freebsd-hackers/20040111.freebsd-hackers). These problems seem to be fixed in the most recent version of the script -- I have been able to successfully import sys, bin, sbin, and lib so far. The next target for testing is contrib as it seems to be the most likely candidate for problems with all those vendor branches. Comments on importing: It's SLOOOOOOOOOOW. It took 43.9 hours just for src/sys, and this is a relatively speedy system! It starts out at a pretty good pace, but the more commits it processes, the slower each one seems to take. For my purposes I would also need some method of incrementally updating the repository with any new commits made to CVS. This doesn't exist yet, but I'm thinking about trying to hack cvs2svn to do this. Kind of an inverse vendor branch I guess. ----------------------------------------------------------------------------- Section the 3rd - Head to Head ----------------------------------------------------------------------------- Yeah, I know comparing Subversion and CVS isn't a fair test -- SVN is designed to be much more than CVS. But it's a comparison that will be inevitably made, so might as well get it out of the way. Bad points (for SVN): * Repo size: The src/sys part of the tree alone is 1.2GB. The same portion of the repo in CVS is only 313MB. I had to keep a script running to routeinly purge unused database logs to avoid running out of disk space during the import. * Working set size: SVN keeps a complete copy of every file that is checked out in a hidden directory analogous to "CVS" directories. This does have some advantages outlined below, but effectively doubles the size of your working directory. * Speed: 0.35 is considerably slower than CVS for some operations. svn checkout is on average about 6 times slower than cvs checkout. Interestingly, CVS seems to benefit from the buffer cache much more than SVN does -- nearly a 50% decrease in execution time for CVS once the cache was populated. Please note however that checking the same thing out over and over isn't a very useful thing to do, and SVN fares better with the more common operations. * Not as thouroughly tested with large repositories. One advantage CVS has is that it is old, widely used, and has been used successfully (more or less) by large installations. SVN simply hasn't had anywhere close to the number of lines of code pushed through it that CVS has. This means it's more likely that SVN has undiscovered bugs, edge cases, etc. * "Requires" Apache for the network server. There is a simpler CVS-like network protocol, but it suffers from the same problems with access control and locking and the like that CVS does. In order to overcome those limitations, you pretty much have to use Apache/WebDAV. Some may argue that this isn't really a negative, but it certainly doesn't go with the K.I.S.S. philosophy. * No cvsup equivalent yet. You can fairly easily use WebDAV to pull a copy of the trunk or a particular branch, but it's not nearly as efficient as the rsync algorithm. There's also no way to use WebDAV to grab a certain date or revision like you can with cvsup -- you have to have the svn client installed. In order to be even a contender to replace CVS, it still needs a *FAST* and *SIMPLE* way to synchronize source with an arbitrary tag or revision. * Still no solution for the repeated merge problem. This is supposed to be addressed post-1.0; no official timeframe on it AFAIK. * I don't think they have added arbitrary keyword support yet. We would probably need a local hack to support $FreeBSD$ Good points: * Atomic commits across multiple files * Near-O(1) branching/tagging, and no branch-point-tag mess * The cvs2svn script is fairly smart and tries to group commits together that should be part of a single commit. I believe it looks at timestamps and commit messages to figure this out. * Move and copy commands that DTRT -- no need for repo copies. * As a result of not needing repo copies, it preserves the history of the trunk. Currently we have no easy way to see what, for example, 2.2-CURRENT looked like on a particular day. Somehow I doubt that sys/amd64/amd64/tsc.c really existed in 1996. SVN wouldn't magically fix existing problems without outside help, but it would be able to keep it from getting any worse. * Subversion is supposed to have a more efficient network layer than CVS. I haven't had a chance to do any real empirical testing on this yet. * svn update is much faster than cvs update. With no changes to the repository, it completes in 1-2 seconds flat. With only a few changes, it takes a few seconds longer but it still quite a bit quicker than CVS. CVS seems to have a much flatter graph with relation to the number of changes being updated -- it takes a while even if nothing changed. * Subversion is better at disconnected operation. Because it keeps a copy of the last checked out revision, you can see what files have changed locally, revert changes on a particular file, create directories, move/rename files, and even generate diffs without having a connection to a remote repository. All of these commands are also much quicker than their CVS equivalents because they are working on a local copy. * Native binary support. SVN treats all files as binary unless you specify otherwise and can efficiently store differences between binary files (CVS has to store the complete file in every revision). This might make things like the compat libs a little easier to manage. ----------------------------------------------------------------------------- Section the 4th - Conclusions ----------------------------------------------------------------------------- Honestly, I don't think Subversion is quite ready yet. However, it is getting _very_ close to being a viable alternative to CVS, for the needs of the FreeBSD project as far as I know them. I'll definitely be trying it out for some of my local projects that are currently stored in CVS. FWIW, my intention is not to start a bikeshed discussion (but if we're doing that my vote is on plaid!) For the most part, CVS does a reasonably good job of keeping the FreeBSD source code in line. However, it does have some weaknesses that make it unsuitable for heavy development -- witness the multitude of projects happening in local Perforce trees. Subversion was brought up before, recently even, but there were still several major showstoppers. A couple of those have been resolved in the last month. Random notes: I know there are other SCMs out there, and will probably take a look at them when I get a chance. I picked Subversion for this test because it's supposed to be the successor of CVS, so it's a logical place to start. It also looks as if Subversion 0.37 (aka 1.0-RC) has just been released. I'll have to take a look at it and see if any of the problems noted above have been resolved. Any comments / corrections / arguments are welcome :) Craig -- "A 'No Parking' sign at a certain location means..." - multiple choice question on NY State learner's permit test