From owner-freebsd-fs Mon Feb 12 15: 7: 0 2001 Delivered-To: freebsd-fs@freebsd.org Received: from smtp10.phx.gblx.net (smtp10.phx.gblx.net [206.165.6.140]) by hub.freebsd.org (Postfix) with ESMTP id 0F41737B491 for ; Mon, 12 Feb 2001 15:06:56 -0800 (PST) Received: (from daemon@localhost) by smtp10.phx.gblx.net (8.9.3/8.9.3) id QAA46130; Mon, 12 Feb 2001 16:06:22 -0700 Received: from usr08.primenet.com(206.165.6.208) via SMTP by smtp10.phx.gblx.net, id smtpdA34BEa; Mon Feb 12 16:06:18 2001 Received: (from tlambert@localhost) by usr08.primenet.com (8.8.5/8.8.5) id QAA11325; Mon, 12 Feb 2001 16:06:44 -0700 (MST) From: Terry Lambert Message-Id: <200102122306.QAA11325@usr08.primenet.com> Subject: Re: Design a journalled file system To: zzhang@cs.binghamton.edu (Zhiui Zhang) Date: Mon, 12 Feb 2001 23:06:44 +0000 (GMT) Cc: cattelan@thebarn.com (Russell Cattelan), freebsd-fs@FreeBSD.ORG In-Reply-To: from "Zhiui Zhang" at Feb 12, 2001 04:21:33 PM X-Mailer: ELM [version 2.5 PL2] MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org > It seems to me that I have failed to explain my point again. So an example > may help. Suppose I have a bitmap block buffer. One transaction allocate > some blocks from it, the other transaction free some blocks into it. If > the bitmap block buffer is not locked for the duration of a transaction, > then it could contain modifications made both transactions. The atomicity > is violated unless you can make the two transactions merge into one later. > On the other hand, if it is locked for a transaction and that transaction > blocks for some other I/O, then performance will suffer (no one can use > the bitmap block buffer for a while). Russell is right, for XFS, and for most Journalled FS's, where the validity marking on the journal entry (as being the most recent) is the most important thing. All transactions are written as if by way of a write-through cache of the modification data. In other words, in his world, there's no such conflict between concurrent operations. Per a previous post, Soft Updates is all about "unless you can make the two transactions merge into one later". Specifically, if you have a disk block, it's 512b. An inode on disk is 128b. This means 4 inodes per block. Similarly, a directory entry block is 512b. A given block will contain between 1 and 16 directory entries, each of which may be in the process of being manipulated. And so on. Soft Updates keeps a list of modifications to conflicted blocks, in core, and actually makes a copy of the conflicted block, and backs out transaction state, when committing partial transactions. It does this by maintaining a state conflict domain dependency list (which is why Soft Updates are sometimes called Soft Dependencies instead). Practically, for a design, you can generally reduce the domains of conflict by increasing your object sizes to 512b. This lets you have things like ACL and immediate file support in inodes, which you can then bill as a feature. For the directory entry blocks, the conflict is already somewhat mitigated by the fact that anyone iterating the directory, you make a copy of the block -- it is a snapshot, not the actual directory contents you are iterating. The NFS "cookie" code for iteration restart is really a kludge; it could have just as easily worked around the difference between on disk and wire and user space directory entries within a given block, by seperating the code into a "copy FS sided unit into snapshot" and "copy data from snapshot into representation buffer" VOPs (I've suggested this many times, and provided the code several). The bottom line is that bitmaps only matter if you implement using bitmaps. For inescapable conflicts (like the "last modified" or "time of last update" in superblock data, which you must have for recovery following a crash, the easiest method to work around the problem is to log superblocks as well, and then iterate to the "most recent valid", during recovery. Ideally, you probably _do_ want to incorporate Soft Updates technology, since it lets you avoid artificial stalls when you enter into an unavoidable conflict (XFS stalls and drains at those points), but it's not immediately necessary (just don't design against it as a future optimization). I really, really urge FS designers to go back to first principles when examining problems, and to consider FSs as transactions to be applied to persistant state data as a result of events. If you do that, then protecting the integrity of the persistant state becomes obvious and easy. Actually, this really brings home the license point for XFS, since it should be obvious that it could benefit from soft updates, which it won't get without paying something (like access to its sources in a useful fashion for the BSD community). Yes, I'm still looking for a commercial license that prohibits making XFS a stand-alone product, but still allows it be used in a commercial setting. The Sun License on the original SLPv1, but fails to grant in perpetuity. It may be that SGIs lawyer will have to do lawyering to work out one that satisfies them. Hopefully SGI will learn the HP JetSend and the Sun JINI and the Net/1 & Net/2 TCP/IP lesson: if you want something to be standard, you can't control it, and if you control it, it won't be standard. Note: my March 1st offer stands. I have yet to hear how to get the unencumbered (SGI-only) GPL code... the clock's ticking. Terry Lambert terry@lambert.org --- Any opinions in this posting are my own and not those of my present or previous employers. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message