From owner-freebsd-hackers Thu Jun 6 11:58:15 1996 Return-Path: owner-hackers Received: (from root@localhost) by freefall.freebsd.org (8.7.5/8.7.3) id LAA25551 for hackers-outgoing; Thu, 6 Jun 1996 11:58:15 -0700 (PDT) Received: from phaeton.artisoft.com (phaeton.Artisoft.COM [198.17.250.211]) by freefall.freebsd.org (8.7.5/8.7.3) with SMTP id LAA25540; Thu, 6 Jun 1996 11:58:12 -0700 (PDT) Received: (from terry@localhost) by phaeton.artisoft.com (8.6.11/8.6.9) id LAA01582; Thu, 6 Jun 1996 11:52:20 -0700 From: Terry Lambert Message-Id: <199606061852.LAA01582@phaeton.artisoft.com> Subject: Re: Breaking ffs - speed enhancement? To: staff@kyklopen.ping.dk (Thomas Sparrevohn) Date: Thu, 6 Jun 1996 11:52:20 -0700 (MST) Cc: terry@lambert.org, dyson@FreeBSD.ORG, jehamby@lightside.com, bde@zeta.org.au, dufault@hda, hackers@FreeBSD.ORG In-Reply-To: from "Thomas Sparrevohn" at Jun 6, 96 00:57:37 am X-Mailer: ELM [version 2.4 PL24] MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: owner-hackers@FreeBSD.ORG X-Loop: FreeBSD.org Precedence: bulk > [snap] > > > I'm personally now less interested in LFS than I am in soft updates, > > and more in the direction of a general graph theory soloution to FS's > > as a set of event nodes, and consistency guarantees as a set of event > > handling ording rules with soft updates implemented as an inter-node > > conflict resoloution schema. > > I don't see any conflict there. The right thing to do would be > to redo the Vfs/Vnode according to soft-updates. But could'nt the > two things be combined? The approach suggested by Ganger and Patt > could be applied to LFS in the directory handling code that expects > some kind of write ordering anyhow. ummmmm yes, and no. Yes, it could, but no, you wouldn't end up with anything that applied globally to all the VFS's if you did it. The VFS stacking is: <-- vfs_syscalls.c, NFS (why "cookies" suck) [ ...] <-- "stacking" file system <-- "disk" file system Not: [ ...] <-- seperate block I/O interface with encapsulation of system dependencies And the place that the soft updates go is in the ordering dependencies in each of the VFS layers and their interaction with the bio interface. In other words, the VFS used to describe the top end consumer interface, now it describes the top end consumer interface and the stacking interface, and there is still no rigidly defined bottom end that is not system/bio/VM dependent. A generic soft-update based bio is a setp in the direction of a defined, system independant, bottom end. Soft updates aren't necessary for LFS, and I kind of doubt that you could implement them at the directory layer in UFS without any changes to FFS/MFS/LFS to also use soft updates. > [snip] > > > > > How complex do you view this to be? I believe that most of the LFS > > single file/directory problems with a catastrophic failure can be > > handled on mount by rolling transaction back (rolling them forward > > would require journalling, not just log-structuring). > > Yes that is one of the major problems. You can only expect the roll forward > in LFS to handle segment inconsistency not structural inconsistency. Since the log is the structure, the strucutral consistency is guaranteed, actually. That's why it's typically a faster startup than UFS following a failure. Getting a bad block in the middle of a log extent is why you would need a seperate fsck. This assumes that the hard error isn't handled by telling the FS, through a yet-to-be-defined VOP, that the block is bad, so the FS should do FS-dependent recovery for whatever type of block it was that died. This is, in any case, a highly improbable failure (though it might be the only one left to consider if LFS works as promised once it it production quality 8-)). If you look at the UFS code, there is a synchronization of the per cylinder group allocation map on mount, and no other fsck needed. In the case of a block failure, it's up to the driver to detect it and notify the FS "this block has been destroyed". In theory, this can be done *without* needing an fsck -- though we'd need a per FS bad-block handling functon, and a driver callback of some kind. I expect that most bad blocking will be handled transparently through a media perfection layer of some kind at the logical device level on it's way through the devfs framework. The final piece of the puzzle is the bio request "recover this block", which will do whtever recovery protocol has been defined (reading the block using hysteresis, bit-voting, whatever) and then provide a replacement block with the "recovered" data and a confidence level. Then the FS uses the confidence level to determine its own recovery protocol. Pretty much, you'd get failure message logged to the console and wherever else, but everything that can be done about the failure will already be done by the time you get the message. > > One of the problems I have with LFS in this regard that I *wouldn't* > > have with an event-based soft updates implementation is implied > > state tracking across multiple FS objects. One example of this would > > be a dBase III database file with an index file. When the database > > changes, the index needs to change as well, iempotently. This is > > handleable for dBase III by rebuilding the index, but a true relational > > database implementation could not be so easily fixed. > > I don't think that the FS layer has to have anything to do with event > graphs. I think it should be possible to have the VFS/Vnode layer handle > that kind of dependency. Yes, a transaction tracking system would be implemented at the VFS to syscall transition, not in the FS itself. Or it would be implemented in a stacking layer (just as easily). The interaction with the FS is that you have a transactioning graph and you have an FS event graph, and in order to guarantee no semantic race conditions, you would need to use the same hierarchy for both. Really, you can think of this as assuring transitive closure over an arbitrary set of combined graph segments. In lock parlance, this would be deadlock avoidance instead of deadlock detection (in the FS, you "detect" it by getting bad data after a failure). The problem is that you can't treat each FS layer as an anonymous block store if you are depending on the semantics being implemented above the consumer interface. A VFS stacking module consumes an underlying VFS differently than the system call layer (or NFS) consumes a VFS, and if you depend on ordering guarantees, you *must* combine the graph cycles. > > A soft udates implementation would allow you to impose event dependency > > on the graph for multi-object transactions (assuming multi-object > > ordering enforcement, like for an LFS log that won't overwrite for > > two seperate events in the same transaction). > > Would'nt that be the same as a general transaction based VFS? Yes, with the exception that there are no longer any potential races for the transactioning systems interaction with the underlying LFS. The transactioning is still logically seperate from the LFS, which supplies rollback capability. For UFS and other FS's, a two stage "rollback" VFS layer could (but need not) be implemented. Regards, Terry Lambert terry@lambert.org --- Any opinions in this posting are my own and not those of my present or previous employers.