From owner-freebsd-current Fri Aug 9 19:00:25 1996 Return-Path: owner-current Received: (from root@localhost) by freefall.freebsd.org (8.7.5/8.7.3) id TAA24077 for current-outgoing; Fri, 9 Aug 1996 19:00:25 -0700 (PDT) Received: from austin.polstra.com (austin.polstra.com [206.213.73.10]) by freefall.freebsd.org (8.7.5/8.7.3) with ESMTP id TAA24072 for ; Fri, 9 Aug 1996 19:00:21 -0700 (PDT) Received: from austin.polstra.com (jdp@localhost) by austin.polstra.com (8.7.5/8.7.3) with ESMTP id SAA24389; Fri, 9 Aug 1996 18:59:59 -0700 (PDT) Message-Id: <199608100159.SAA24389@austin.polstra.com> To: "Boyd R. Faulkner" cc: current@FreeBSD.org Subject: Re: Praise for CVSup In-reply-to: Your message of "Fri, 09 Aug 1996 20:12:05 -0459." <199608100111.UAA14116@utgard.bga.com> Date: Fri, 09 Aug 1996 18:59:59 -0700 From: John Polstra Sender: owner-current@FreeBSD.org X-Loop: FreeBSD.org Precedence: bulk > CVSup does more file checking than sup does. You can end up with > files with the right date and size but not the right contents and, > while I may be wrong, sup will not detect this. Since CVSup uses > MD5 (yes?) to ID the files, you are gurarnteed the correct contents. Well ... yes and no. It depends on the situation. In general, CVSup does _not_ ID the files via MD5 checksums. It compares the time stamps between the client and the server, and if they are identical, it assumes that the files are identical too. In that case, it doesn't examine the files further. (There is an exception which, I conjecture, applies to your particular case. I'll explain that in a minute.) The reason it doesn't compare MD5 checksums for every file on the client and the server is that it would be too slow, too compute intensive, and too disk intensive. No real-time network file update package could do that, without bringing the server to its knees. It has to cull the unchanged files from the list using just the information that it can get from a call to stat(). The exception is when you are using CVSup's checkout mode the very first time. In that case, CVSup cannot ID your existing checked-out files via the time stamps, because the time stamps of the checked-out files are not the same as the time stamps of the corresponding RCS files on the server machine. So it really has no choice. On the client, it checksums each file. On the server, it parses each RCS file, and checksums each revision on the selected branch, from most recent to least recent. This is the worst situation, in terms of server load, but it's not as bad as it sounds. First, it's computing the checksums on the fly as it generates revisions -- not doing some gross thing like calling "co" to emit them to temporary files. So its main activity involves crunching through a memory-mapped RCS file, computing the checksums as it goes. Second, if the client already has files, they're probably fairly recent. So the server won't have to checksum very many revisions before it finds the right one. Third, this situation only happens the first time a given client uses CVSup in checkout mode. After that, the so-called "list files" remember which revisions the client possesses. The other place where MD5 checksums are used is to verify each file that CVSup has updated by editing in new deltas and so forth. That was inspired by CTM, with a few gentle prods from Justin Gibbs, and it has turned out to be a really good thing. Besides instilling confidence in CVSup, it permits it to be imperfect and incomplete in the way it deals with RCS files. I learned during the alpha test period that there is an enormous variety of truly sick things that people can and _will_ do to the RCS files in a CVS repository. If CVSup had to be perfect in anticipating every one of them, well, I wouldn't trust it myself. But with the checksum verification, it doesn't even have to handle the rarest kinds of changes properly at all. When those kinds of changes happen, it edits the file incorrectly, but finds out about it when it verifies the checksums. Then it says, "Checksum mismatch for foo -- will transfer entire file". At that point it leaves the file untouched, and it arranges to transfer the whole thing at the end of the run. It works well, and it helps keep people from getting mad at me. You may even see this happen the next time you run CVSup. Today I (needlessly, it turns out) changed the default RCS keyword expansion on one of the repository files for the "net/cvsup" port. That is one of the two or three kinds of very rare changes that CVSup's RCS file analysis currently does not cover. (The other one that comes to mind is changes in the list of locked revisions. Since CVS never locks its RCS files, it's not much of an issue.) -- John