From owner-freebsd-fs  Mon Feb 12 15: 7: 0 2001
Delivered-To: freebsd-fs@freebsd.org
Received: from smtp10.phx.gblx.net (smtp10.phx.gblx.net [206.165.6.140])
	by hub.freebsd.org (Postfix) with ESMTP id 0F41737B491
	for <freebsd-fs@FreeBSD.ORG>; Mon, 12 Feb 2001 15:06:56 -0800 (PST)
Received: (from daemon@localhost)
	by smtp10.phx.gblx.net (8.9.3/8.9.3) id QAA46130;
	Mon, 12 Feb 2001 16:06:22 -0700
Received: from usr08.primenet.com(206.165.6.208)
 via SMTP by smtp10.phx.gblx.net, id smtpdA34BEa; Mon Feb 12 16:06:18 2001
Received: (from tlambert@localhost)
	by usr08.primenet.com (8.8.5/8.8.5) id QAA11325;
	Mon, 12 Feb 2001 16:06:44 -0700 (MST)
From: Terry Lambert <tlambert@primenet.com>
Message-Id: <200102122306.QAA11325@usr08.primenet.com>
Subject: Re: Design a journalled file system
To: zzhang@cs.binghamton.edu (Zhiui Zhang)
Date: Mon, 12 Feb 2001 23:06:44 +0000 (GMT)
Cc: cattelan@thebarn.com (Russell Cattelan), freebsd-fs@FreeBSD.ORG
In-Reply-To: <Pine.SOL.4.21.0102121611230.14762-100000@opal> from "Zhiui Zhang" at Feb 12, 2001 04:21:33 PM
X-Mailer: ELM [version 2.5 PL2]
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org

> It seems to me that I have failed to explain my point again. So an example
> may help. Suppose I have a bitmap block buffer.  One transaction allocate
> some blocks from it, the other transaction free some blocks into it. If
> the bitmap block buffer is not locked for the duration of a transaction,
> then it could contain modifications made both transactions. The atomicity
> is violated unless you can make the two transactions merge into one later.
> On the other hand, if it is locked for a transaction and that transaction
> blocks for some other I/O, then performance will suffer (no one can use
> the bitmap block buffer for a while).

Russell is right, for XFS, and for most Journalled FS's, where the
validity marking on the journal entry (as being the most recent)
is the most important thing.  All transactions are written as if
by way of a write-through cache of the modification data.

In other words, in his world, there's no such conflict between
concurrent operations.

Per a previous post, Soft Updates is all about "unless you can
make the two transactions merge into one later".

Specifically, if you have a disk block, it's 512b.  An inode on
disk is 128b.  This means 4 inodes per block.

Similarly, a directory entry block is 512b.  A given block will
contain between 1 and 16 directory entries, each of which may
be in the process of being manipulated.

And so on.

Soft Updates keeps a list of modifications to conflicted blocks,
in core, and actually makes a copy of the conflicted block, and
backs out transaction state, when committing partial transactions.
It does this by maintaining a state conflict domain dependency
list (which is why Soft Updates are sometimes called Soft
Dependencies instead).


Practically, for a design, you can generally reduce the domains
of conflict by increasing your object sizes to 512b.  This lets
you have things like ACL and immediate file support in inodes,
which you can then bill as a feature.

For the directory entry blocks, the conflict is already somewhat
mitigated by the fact that anyone iterating the directory, you
make a copy of the block -- it is a snapshot, not the actual
directory contents you are iterating.  The NFS "cookie" code
for iteration restart is really a kludge; it could have just as
easily worked around the difference between on disk and wire and
user space directory entries within a given block, by seperating
the code into a "copy FS sided unit into snapshot" and "copy data
from snapshot into representation buffer" VOPs (I've suggested
this many times, and provided the code several).

The bottom line is that bitmaps only matter if you implement
using bitmaps.  For inescapable conflicts (like the "last
modified" or "time of last update" in superblock data, which
you must have for recovery following a crash, the easiest method
to work around the problem is to log superblocks as well, and
then iterate to the "most recent valid", during recovery.

Ideally, you probably _do_ want to incorporate Soft Updates
technology, since it lets you avoid artificial stalls when you
enter into an unavoidable conflict (XFS stalls and drains at
those points), but it's not immediately necessary (just don't
design against it as a future optimization).

I really, really urge FS designers to go back to first principles
when examining problems, and to consider FSs as transactions to
be applied to persistant state data as a result of events.  If
you do that, then protecting the integrity of the persistant
state becomes obvious and easy.


Actually, this really brings home the license point for XFS,
since it should be obvious that it could benefit from soft
updates, which it won't get without paying something (like
access to its sources in a useful fashion for the BSD community).

Yes, I'm still looking for a commercial license that prohibits
making XFS a stand-alone product, but still allows it be used
in a commercial setting.  The Sun License on the original SLPv1,
but fails to grant in perpetuity.  It may be that SGIs lawyer
will have to do lawyering to work out one that satisfies them.

Hopefully SGI will learn the HP JetSend and the Sun JINI and the
Net/1 & Net/2 TCP/IP lesson: if you want something to be standard,
you can't control it, and if you control it, it won't be standard.

Note: my March 1st offer stands.  I have yet to hear how to get
the unencumbered (SGI-only) GPL code... the clock's ticking.


					Terry Lambert
					terry@lambert.org
---
Any opinions in this posting are my own and not those of my present
or previous employers.


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message