From owner-freebsd-questions@FreeBSD.ORG  Thu Oct 20 20:51:10 2005
Return-Path: <owner-freebsd-questions@FreeBSD.ORG>
X-Original-To: freebsd-questions@freebsd.org
Delivered-To: freebsd-questions@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 3627C16A41F
	for <freebsd-questions@freebsd.org>;
	Thu, 20 Oct 2005 20:51:10 +0000 (GMT) (envelope-from user@dhp.com)
Received: from shell.dhp.com (shell.dhp.com [199.245.105.1])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 9755243D8B
	for <freebsd-questions@freebsd.org>;
	Thu, 20 Oct 2005 20:51:07 +0000 (GMT) (envelope-from user@dhp.com)
Received: by shell.dhp.com (Postfix, from userid 896)
	id 9794631323; Thu, 20 Oct 2005 16:51:06 -0400 (EDT)
Date: Thu, 20 Oct 2005 16:51:06 -0400 (EDT)
From: user <user@dhp.com>
To: Gayn Winters <gayn.winters@bristolsystems.com>
In-Reply-To: <003201c5d5b4$753f9d20$c901a8c0@workdog>
Message-ID: <Pine.LNX.4.21.0510201642030.8180-100000@shell.dhp.com>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Cc: "'Andrew P.'" <infofarmer@gmail.com>, freebsd-questions@freebsd.org
Subject: RE: FreeBSD UFS2 snapshots, and math ... - resolved, but two more Qs
X-BeenThere: freebsd-questions@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: User questions <freebsd-questions.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-questions>, 
	<mailto:freebsd-questions-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-questions>
List-Post: <mailto:freebsd-questions@freebsd.org>
List-Help: <mailto:freebsd-questions-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-questions>, 
	<mailto:freebsd-questions-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Thu, 20 Oct 2005 20:51:10 -0000


Folks,

On Thu, 20 Oct 2005, Gayn Winters wrote:

> > Imagine that each data block is marked with labels
> > on change. It doesn't matter how many labels there
> > are, there will be only one data block saved.
> 
> In trying to follow this thread, I started looking around for a precise
> definition of snapshot.
> Man mksnap_ffs
> wasn't too helpful, and googling for "snapshot" etc. wasn't fruitful.
> I'm guessing that the original author of the thread (user at dhp.com)
> may also need such a definition.  Can someone provide a pointer to a
> specification or at least an RFC-like paper?


I found one:

http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/ufs/ffs/README.snapshot?rev=1.4

and further, I did some tests and discovered that what I was being told
(by you folks) was indeed correct.

No matter how many snapshots you have, the changes in blocks since the
tiem before the first snapshot is only recorded in one of them.  That is
to say, if I do the following:

- create 4 1gig /dev/zero filled files
- create a snapshot
- overwrite one of those 1gig files with /dev/random

My free space will have decreased by 1gig.  So far so good.

If I then:

- create a second snapshot
- overwrite a different 1gig file with /dev/random

My free space merely decreases by another 1gig.  It makes sense to me now
because it has occurred to me that since the second file had not changed
between the creation of the first and second snapshot, there is no reason
for _both_ snapshots to _both_ say "this 1gig random file used to be
filled with zeros" - it would be redundant.

So that's great ... but I am curious, how do they know ?  I think my
previous assumption (that the first _and_ the second snapshot file would
_both_ have to record the change of file #2 from zero to random) was based
on the notion that these snapshot files were totally autonomous and
independent, and had no general organization behind them.  If that was the
case, then I am still fairly certain both snapshots would need to record
the change of the second file.

So what is the behind the scenes organization that makes it possible for
the snapshot files to not duplicate data like that ?

ALSO,

I have noticed that if you:

- dd 1gig /dev/zero file
- create snapshot
- overwrite that 1gig file with /dev/random

(free space decreases by 1gig, as expected)

- rewrite that 1gig file with /dev/zero again

You _don't_ get that 1gig of free space back ... which surprises me, since
it was all zeros before, and its all zeros now ... how does the snapshot
know those are "different zeros" ?  And what ramifications does this have
for restoring, etc., if identical files do not get counted as identical in
the snapshot ?

thanks.