From owner-freebsd-hackers  Thu Jun  6 11:58:15 1996
Return-Path: owner-hackers
Received: (from root@localhost)
          by freefall.freebsd.org (8.7.5/8.7.3) id LAA25551
          for hackers-outgoing; Thu, 6 Jun 1996 11:58:15 -0700 (PDT)
Received: from phaeton.artisoft.com (phaeton.Artisoft.COM [198.17.250.211])
          by freefall.freebsd.org (8.7.5/8.7.3) with SMTP id LAA25540;
          Thu, 6 Jun 1996 11:58:12 -0700 (PDT)
Received: (from terry@localhost) by phaeton.artisoft.com (8.6.11/8.6.9) id LAA01582; Thu, 6 Jun 1996 11:52:20 -0700
From: Terry Lambert <terry@lambert.org>
Message-Id: <199606061852.LAA01582@phaeton.artisoft.com>
Subject: Re: Breaking ffs - speed enhancement?
To: staff@kyklopen.ping.dk (Thomas Sparrevohn)
Date: Thu, 6 Jun 1996 11:52:20 -0700 (MST)
Cc: terry@lambert.org, dyson@FreeBSD.ORG, jehamby@lightside.com,
        bde@zeta.org.au, dufault@hda, hackers@FreeBSD.ORG
In-Reply-To: <Pine.BSF.3.91.960606003047.2129A-100000@kyklopen> from "Thomas Sparrevohn" at Jun 6, 96 00:57:37 am
X-Mailer: ELM [version 2.4 PL24]
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Sender: owner-hackers@FreeBSD.ORG
X-Loop: FreeBSD.org
Precedence: bulk

> [snap]
> 
> > I'm personally now less interested in LFS than I am in soft updates,
> > and more in the direction of a general graph theory soloution to FS's
> > as a set of event nodes, and consistency guarantees as a set of event
> > handling ording rules with soft updates implemented as an inter-node
> > conflict resoloution schema.
> 
> I don't  see any conflict there. The right thing to do would be
> to redo the Vfs/Vnode according to soft-updates. But could'nt the
> two things be combined? The approach suggested by Ganger and Patt
> could be applied to LFS in the directory handling code that expects
> some kind of write ordering anyhow.

ummmmm yes, and no.

Yes, it could, but no, you wouldn't end up with anything that applied
globally to all the VFS's if you did it.

The VFS stacking is:

<VFS consumer>			<-- vfs_syscalls.c, NFS (why "cookies" suck)
[<VFS> ...]			<-- "stacking" file system
<VFS with internal bio consumer> <-- "disk" file system

Not:

<VFS consumer>
[<VFS> ...]
<VFS>
<VFS bottom end bio interface>	<-- seperate block I/O interface with
				    encapsulation of system dependencies

And the place that the soft updates go is in the ordering dependencies
in each of the VFS layers and their interaction with the bio interface.

In other words, the VFS used to describe the top end consumer interface,
now it describes the top end consumer interface and the stacking
interface, and there is still no rigidly defined bottom end that is
not system/bio/VM dependent.

A generic soft-update based bio is a setp in the direction of a
defined, system independant, bottom end.


Soft updates aren't necessary for LFS, and I kind of doubt that
you could implement them at the directory layer in UFS without
any changes to FFS/MFS/LFS to also use soft updates.


> [snip]
> 
> > 
> > How complex do you view this to be?  I believe that most of the LFS
> > single file/directory problems with a catastrophic failure can be
> > handled on mount by rolling transaction back (rolling them forward
> > would require journalling, not just log-structuring).
> 
> Yes that is one of the major problems. You can only expect the roll forward
> in LFS to handle segment inconsistency not structural inconsistency.

Since the log is the structure, the strucutral consistency is
guaranteed, actually.  That's why it's typically a faster startup
than UFS following a failure.

Getting a bad block in the middle of a log extent is why you would
need a seperate fsck.  This assumes that the hard error isn't handled
by telling the FS, through a yet-to-be-defined VOP, that the block
is bad, so the FS should do FS-dependent recovery for whatever type
of block it was that died.

This is, in any case, a highly improbable failure (though it might
be the only one left to consider if LFS works as promised once it
it production quality 8-)).

If you look at the UFS code, there is a synchronization of the
per cylinder group allocation map on mount, and no other fsck
needed.

In the case of a block failure, it's up to the driver to detect it
and notify the FS "this block has been destroyed".  In theory, this
can be done *without* needing an fsck -- though we'd need a per
FS bad-block handling functon, and a driver callback of some kind.

I expect that most bad blocking will be handled transparently
through a media perfection layer of some kind at the logical
device level on it's way through the devfs framework.  The
final piece of the puzzle is the bio request "recover this
block", which will do whtever recovery protocol has been
defined (reading the block using hysteresis, bit-voting, whatever)
and then provide a replacement block with the "recovered" data
and a confidence level.  Then the FS uses the confidence level
to determine its own recovery protocol.


Pretty much, you'd get failure message logged to the console and
wherever else, but everything that can be done about the failure
will already be done by the time you get the message.


> > One of the problems I have with LFS in this regard that I *wouldn't*
> > have with an event-based soft updates implementation is implied
> > state tracking across multiple FS objects.  One example of this would
> > be a dBase III database file with an index file.  When the database
> > changes, the index needs to change as well, iempotently.  This is
> > handleable for dBase III by rebuilding the index, but a true relational
> > database implementation could not be so easily fixed.
> 
> I don't think that the FS layer has to have anything to do with event
> graphs. I think it should be possible to have the VFS/Vnode layer handle 
> that kind of dependency.

Yes, a transaction tracking system would be implemented at the VFS
to syscall transition, not in the FS itself.  Or it would be implemented
in a stacking layer (just as easily).  The interaction with the FS
is that you have a transactioning graph and you have an FS event
graph, and in order to guarantee no semantic race conditions, you
would need to use the same hierarchy for both.

Really, you can think of this as assuring transitive closure over
an arbitrary set of combined graph segments.  In lock parlance,
this would be deadlock avoidance instead of deadlock detection
(in the FS, you "detect" it by getting bad data after a failure).

The problem is that you can't treat each FS layer as an anonymous
block store if you are depending on the semantics being implemented
above the consumer interface.  A VFS stacking module consumes an
underlying VFS differently than the system call layer (or NFS)
consumes a VFS, and if you depend on ordering guarantees, you
*must* combine the graph cycles.


> > A soft udates implementation would allow you to impose event dependency
> > on the graph for multi-object transactions (assuming multi-object
> > ordering enforcement, like for an LFS log that won't overwrite for
> > two seperate events in the same transaction).
> 
> Would'nt that be the same as a general transaction based VFS?

Yes, with the exception that there are no longer any potential races
for the transactioning systems interaction with the underlying LFS.
The transactioning is still logically seperate from the LFS, which
supplies rollback capability.  For UFS and other FS's, a two stage
"rollback" VFS layer could (but need not) be implemented.


					Regards,
					Terry Lambert
					terry@lambert.org
---
Any opinions in this posting are my own and not those of my present
or previous employers.