Skip site navigation (1)Skip section navigation (2)
Date:      06 Mar 2000 10:31:39 +0100
From:      Dag-Erling Smorgrav <des@flood.ping.uio.no>
To:        "Daniel O'Connor" <doconnor@gsoft.com.au>
Cc:        kc5vdj@swbell.net, chat@FreeBSD.ORG, grog@lemis.com, mark@dogma.freebsd-uk.eu.org, (Alfred Perlstein) <bright@wintelcom.net>
Subject:   Re: M$ one-ups UNIX???
Message-ID:  <xzp3dq4a1hg.fsf@flood.ping.uio.no>
In-Reply-To: "Daniel O'Connor"'s message of "Thu, 02 Mar 2000 14:29:01 %2B1030 (CST)"
References:  <XFMail.000302142901.doconnor@gsoft.com.au>

next in thread | previous in thread | raw e-mail | index | archive | help
"Daniel O'Connor" <doconnor@gsoft.com.au> writes:
> On 02-Mar-00 Jim Bryant wrote:
> >  this microsoft thing sounds like more trouble than it's worth.  chalk
> >  one up to marketing hype over fuctionality.  sometimes it's easier to
> >  make a direct copy of a tree or something than to use CVS or RCS, and
> >  thus you would have untouched dupes out there depending on how far you
> >  are along with whatever you are doing.with the mickeysoft method, you
> >  would be changing the originals.
> 
> I think you're assuming they're too stupid.
> 
> I would say it would do COW.

What people are missing here is that SIS is meant for file servers,
where read operations greatly outnumber write operations. Merging
copies of identical files saves not only disk space, but also (and
more importantly) cache space (at least if their disk cache is up to
snuff).

Imagine a server that holds home directories for a couple of dozen
developers, each of them with a few complete copies of the FreeBSD
kernel source (or of whatever product they're developing), each of
them with a few local variations. Imagine the disk and memory load on
the server every time one of them does a cvs diff... now throw SIS
into the mix (assuming a relatively efficient implementation of SIS).
Mix well, serve chilled.

The difficult part of SIS is not comparing files and merging them if
they are identical, but deciding which files to compare, and when. If
you can peek into the name cache, a good strategy would be to compare
files with identical names (do a reverse strcmp, compare the files if
the strcmp matches past at least one slash). You can make comparison
(relatively) cheap by storing checksums of each block. For every pair
of candidates, compare size first, then a checksum of the block
checksums, then the complete block checksums, and finally the complete
file.

The very best would be to use a set of checksum algorithms which
guaranteed that no two blocks of equal size could have identical
checksums and yet be different, but I have a hunch that says it can be
proved that there is no such beast.

DES
-- 
Dag-Erling Smorgrav - des@flood.ping.uio.no


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-chat" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?xzp3dq4a1hg.fsf>