Date: Wed, 15 Jul 1998 21:09:54 +0000 (GMT) From: Terry Lambert <tlambert@primenet.com> To: Matthew.Alton@anheuser-busch.com (Alton, Matthew) Cc: FreeBSD-fs@FreeBSD.ORG Subject: Re: LFS & soft updates Message-ID: <199807152109.OAA19937@usr08.primenet.com> In-Reply-To: <31B3F0BF1C40D11192A700805FD48BF90177660F@STLABCEXG011> from "Alton, Matthew" at Jul 15, 98 05:30:16 pm
next in thread | previous in thread | raw e-mail | index | archive | help
> Just for fun - are lfs and Kirk McKusick's Ganger-Patt soft update > scheme necessarily mutually exclusive? I don't know anything > about soft updates yet. Just wondered if I should look at the > GP papers before designing LFS v.2. Note: Doing this work would be a good Master's Thesis project; solving the general problem such that you could stack FS layers arbitrarily using the same underlying graph code would probably be worth a PhD Dissertation. LFS and soft updates are not mutually exclusive, but in order to optimize the representational geometry (how the structures are laid out, what they imply, precalculation instead of postcalculation of some dependency related issues, etc., etc.), the FFS soft dependency code as implemented is very depndent on the FFS architecture itself. It is possible to take a non-traditional view of an FS, and look at it as a series of events and actors. Each event/actor relationship is what the soft updates code represents, and the ordering of the actors operations as a result of the events is what a dependency models. Effectively, the code embodies the result of a Warshall's Algorithm run on the dependency graph generated from looking at the FS as events: transitive closure over the dependency graph. The work Julian, Kirk, and anyone else who has submitted patches have been engaged in is removing redundant OpenBSD and/or BSDI dependency nodes, and adding FreeBSD/unified VM dependency nodes. In addition, they have resolved some problems that are still issues for OpenBSD and BSDI (hopefully, both will benefit from the efforts). This work would have been a hell of a lot easier if the FFS and its interaction with the rest of the kernel and I/O subsystem had been documented in terms of event/actor relationships. If this had been done, one would need only to "diff" the graphs, and they would be, for all practical purposes, asking "will the code that needs to be changed please stand up". I personally would have preferred an implementation that registered a graph and computed the Warshal's at the time an FS was instanced (an O(3) algorithm, which is extermely cheap in graph theory terms), and pre-calculated the Hamiltonian cycles across the graph to allow people to stick new edges in later with a minimal recalculation cost. This would have allowed the soft updates mechanism to be extensible, at the cost of a full linear node traversal -- O(1) -- for each new nodal relationship (ie: stacking FS layers, export of a transactioning system to user space, etc.). This would have cost memory, as Kirk points out, but I don't think it would have cost as much memory as he seems to believe it would have cost, since you could still encode the dependency data as Kirk has done; the penalty would be serializing dependencies at the VFS/VFS boundry. The memory costs would be for inter-VFSOP dependency resoloution, seperate from the integration of this code into the FFS code within the ops themselves. If you want soft updates in the LFS code, which might make *some* sense, in terms of reducing the work for the cleaner, and adding implicit write gathering, and similar benefits to LFS, then you should do this: 1) Model the current FFS code in terms of event/actor relationships. Now that the soft updates code seems to be working fairly completely, this should be rather easy; just go through the code, and everywhere there is an incusrion of the soft udates code, that is an event/actor dependency relationship. 2) Now do the same for LFS. For the UFS code, which both LFS and FFS depend upon, this should be rather trivial, since it is already done. For the LFS code, you will have to specifically identify the points in the model where each event/actor relationship occurs. 3) Implement the dependency resoloution code that will be executed off of the slotted soft clock that is the syncd to resolve each of these relationships; both queuing of dependencies when the events occur, and conflicts on queued data changes which may occur between the new actors and the existing ones. This is not a trivial undertaking; mostly it is tedious detail work requiring a lot of academic rigor. A failure in rigor will result in a system that fails to operate as expected. As with the FFS soft updates code, a single missed dependency will result in a system that is, in certain circumstances, no better than async mounts (the degradation case is a system that is as unreliable as an async mount, and which fails to make the integrity guarantees that were the reason you expended the effort in the first place). I don't think I'm exagerating either the difficulty or the value of the resulting work. The original Ganger/Patt work was at this level of difficulty, and Kirk McKusick is definitely no slouch as a Computer Scientist, and he did not implement the code over night. Terry Lambert terry@lambert.org --- Any opinions in this posting are my own and not those of my present or previous employers. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199807152109.OAA19937>