Date: 06 Mar 2000 10:31:39 +0100 From: Dag-Erling Smorgrav <des@flood.ping.uio.no> To: "Daniel O'Connor" <doconnor@gsoft.com.au> Cc: kc5vdj@swbell.net, chat@FreeBSD.ORG, grog@lemis.com, mark@dogma.freebsd-uk.eu.org, (Alfred Perlstein) <bright@wintelcom.net> Subject: Re: M$ one-ups UNIX??? Message-ID: <xzp3dq4a1hg.fsf@flood.ping.uio.no> In-Reply-To: "Daniel O'Connor"'s message of "Thu, 02 Mar 2000 14:29:01 %2B1030 (CST)" References: <XFMail.000302142901.doconnor@gsoft.com.au>
next in thread | previous in thread | raw e-mail | index | archive | help
"Daniel O'Connor" <doconnor@gsoft.com.au> writes: > On 02-Mar-00 Jim Bryant wrote: > > this microsoft thing sounds like more trouble than it's worth. chalk > > one up to marketing hype over fuctionality. sometimes it's easier to > > make a direct copy of a tree or something than to use CVS or RCS, and > > thus you would have untouched dupes out there depending on how far you > > are along with whatever you are doing.with the mickeysoft method, you > > would be changing the originals. > > I think you're assuming they're too stupid. > > I would say it would do COW. What people are missing here is that SIS is meant for file servers, where read operations greatly outnumber write operations. Merging copies of identical files saves not only disk space, but also (and more importantly) cache space (at least if their disk cache is up to snuff). Imagine a server that holds home directories for a couple of dozen developers, each of them with a few complete copies of the FreeBSD kernel source (or of whatever product they're developing), each of them with a few local variations. Imagine the disk and memory load on the server every time one of them does a cvs diff... now throw SIS into the mix (assuming a relatively efficient implementation of SIS). Mix well, serve chilled. The difficult part of SIS is not comparing files and merging them if they are identical, but deciding which files to compare, and when. If you can peek into the name cache, a good strategy would be to compare files with identical names (do a reverse strcmp, compare the files if the strcmp matches past at least one slash). You can make comparison (relatively) cheap by storing checksums of each block. For every pair of candidates, compare size first, then a checksum of the block checksums, then the complete block checksums, and finally the complete file. The very best would be to use a set of checksum algorithms which guaranteed that no two blocks of equal size could have identical checksums and yet be different, but I have a hunch that says it can be proved that there is no such beast. DES -- Dag-Erling Smorgrav - des@flood.ping.uio.no To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-chat" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?xzp3dq4a1hg.fsf>
