From owner-freebsd-arch@FreeBSD.ORG Tue Apr 1 01:13:07 2008 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 8CE9B1065672; Tue, 1 Apr 2008 01:13:07 +0000 (UTC) (envelope-from bakul@bitblocks.com) Received: from mail.bitblocks.com (bitblocks.com [64.142.15.60]) by mx1.freebsd.org (Postfix) with ESMTP id 22F4E8FC12; Tue, 1 Apr 2008 01:13:07 +0000 (UTC) (envelope-from bakul@bitblocks.com) Received: from bitblocks.com (localhost.bitblocks.com [127.0.0.1]) by mail.bitblocks.com (Postfix) with ESMTP id 2A4875B41; Mon, 31 Mar 2008 18:13:06 -0700 (PDT) To: Matthew Dillon In-reply-to: Your message of "Mon, 31 Mar 2008 16:55:07 PDT." <200803312355.m2VNt7ZY030308@apollo.backplane.com> Date: Mon, 31 Mar 2008 18:13:05 -0700 From: Bakul Shah Message-Id: <20080401011306.2A4875B41@mail.bitblocks.com> Cc: Christopher Arnold , Martin Fouts , Alfred Perlstein , qpadla@gmail.com, arch@freebsd.org, Poul-Henning Kamp Subject: Re: Flash disks and FFS layout heuristics X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 01 Apr 2008 01:13:07 -0000 On Mon, 31 Mar 2008 16:55:07 PDT Matthew Dillon wrote: > There is one issue here and that is the transactional nature of most > filesystem operations. For example, if you append to a file with > a write() you are not only writing new data to the backing store, you > are also updating the on-disk inode's st_size field. Those are two > widely separated pieces of information which must be transactionally > bound --- all or nothing. In this regard the crash recovery code needs > to understand that they are bound and either recover the whole > transaction or none of it. > > Once you get to that point you also have to worry about > interdependancies between transactions... a really sticky issue > that is the reason for a huge chunk of softupdate's complexity in > UFS. Basically you can wind up with interdependant transactions > which must be properly recovered. An example would be doing a write() > and then doing another write(). The second write() cannot be recovered > unless the first can also be recovered. Separate transactions, but with > a dependancy. My instinct is to not combine transactions. That is, every data write results in a sequence: {data, [indirect blocks], inode, ..., root block}. Until the root block is written to the disk this is not a "commited" transaction and can be thrown away. In a Log-FS we always append on write; we never overwrite any data/metadata so this is easy and the FS state remains consistent. FFS overwrites blocks so all this gets far more complicated. Sort of like the difference between reasoning about functional programs & imperative programs! Now, it may be possible to define certain rules that allows one to combine transactions. For instance, write1(block n), write2(block n) == write2(block n) write(block n of file f1), delete file f1 == delete file f1 etc. That is, as long as write1 & associated metadata writes are not flushed to the disk, and a later write (write2) comes along, the earlier write (write1) can be thrown away. [But I have no idea if this is worth doing or even doable!] This is reminiscent of the bottom up rewrite system (BURS) used in some code generators (such as lcc's). The idea is the same here: replace a sequence of operations with an equivalent but lower cost sequence. > Such interdependancies can become arbitrarily complex the longer > you leave meta-data changes unflushed. The question ultimately becomes > whether the recovery code can deal with the complexity or not. If not > you may be forced to flush rather then create the interdependancy. > > HAMMER has precisely this issue with it's UNDO sequencing. The complexity > is in the algorith, but not the (fairly small) amount of time it takes > to actually perform the recovery operation. I don't understand the complexity. Basically your log should allow you to define a functional programming abstraction -- where you never overwrite any data/metadata for any active transactions and so reasoning becomes easier. [But may be we should take any hammer discussion offline]