Date: Fri, 14 Dec 2001 12:58:58 -0800 From: Terry Lambert <tlambert2@mindspring.com> To: Carl Schmidt <carl@slackerbsd.org> Cc: Brett Glass <brett@lariat.org>, Hiten Pandya <hitmaster2k@yahoo.com>, Brad Knowles <brad.knowles@skynet.be>, chat@FreeBSD.ORG, phk@FreeBSD.ORG, grog@FreeBSD.ORG Subject: Re: IBM suing (was: RMS Suing was [SUGGESTION] - JFS for FreeBSD) Message-ID: <3C1A6812.A0EFCB73@mindspring.com> References: <a05101013b83fd20c4206@[10.0.1.22]> <4.3.2.7.2.20011214123703.02ad7290@localhost> <20011214194909.GA2943@Carbon.SlackerBSD.ORG>
next in thread | previous in thread | raw e-mail | index | archive | help
carl@slackerbsd.org wrote: > This has been beat to death over and over but people still do not understand > that softupdates will not minimize data loss. It guarantees metadata to be > written, not `normal' data. The quick intro to softupdates on Kirk McKusick's > site clearly states this fact: http://www.mckusick.com/softdep/. Softupdates > will delay writing of data which is why you get the speed increase but if you > pull the plug on the machine in the middle of something trying to write to the > disk you may lose the data it was trying to write. It is very simple to prove > by doing it. Try doing something like extracting a tarball and powering the > machine off in the middle of it then see what fsck says about the unclaimed > blocks and whatnot. FWIW: By default, JFS operates by journalling metadata updates, but not data updates (it can operate in one of three modes). See the article on http://www.ibm.com/developerworks/ for details. The major value in JFS is that it exports a transactiong interface to user space. Soft Updates could have done this (by implying an edge to a synthetic dependency) but didn't. This was one of my original complaints with the soft updates implementation in FreeBSD, since, as well as not exporting such an interface to user space, it did not export such an interface at the VFS boundary layer, which means that it can't span stacking modules, even if both of them support soft updates, without introducing a serialization barrier. The point in a transactioning interface to the applicaiton is that you can know whether a given transaction has been committed to stable storage, or not, and delay your response to one of many clients until it has been committed. In UNIX systems without such an interface, you usually see a lot of "fsync" or "sync" operations. JFS also fails to solve the "chicken and egg" problem of recovery following a failure (even if journalling of user data is enabled, rather than the default of just metadata). Soft updates has this problem, too. The problem is that you want to recover from a failure to a known good state. But you can't always tell the reason for the failure. If the reason is a hardware or controller error, rather than, for example, a power failure, then you need to perform a full fsck to recover. But how do you tell a power failure from some other data corruption related failure (e.g. a panic from a wild pointer that cause pending journal data to be written corrupted, or an unrecoverable meadia error of some kind). Most high end hardware handles this by logging a failure code to NVRAM, which it can then use to know whether recovery will require a full check, or not (the default value at startup is "full check required", so if it fails catastrophically, a full check is done). For power failure, this requires specific power supply capabilities to handle; it requires AC fail notification, with sufficient DC holdup to write the failure cause out, before hard stop. This is usually done via Lithium Ion batter backed RAM, since CMOS takes a lot of power and is slow to write... but I've seen CMOS used, as well. There is a semi-useful workaround, but it requires that the system is relatively quiescent, so that at the time of failure, it can be in a recoverable state. The way it works is called "soft read-only", and it's implemented by flushing all data out, and marking the FS clean, and setting a "soft read-only" flag on the in-core superblock. Then if you want to write the disk after it is in this state, it has to first mark the FS dirty, and after that is committed to stable storage, clears the soft read-only bit, and allows the write operation to continue. This is very trivial to implement; I'm very surprised that FreeBSD doesn't have it already. In any case, this doesn't help with servers where writes are common, since they are, by the intrinsic nature of servers, rarely quiescent; if, on the other hand, writes are rare (e.g. a web server serving mostly static content), then it sould be quite useful. "Soft read-only" avoids the problem, since you only have to know the failure cause in the case that you have a dirty FS; a clean FS will not have bad data on it, so you are safe to start without a fsck. It should be noted that the Soft Updates implementation, and metadata only journalling share the implementation detail that, following a soft recoverable crash (e.g. a power failure or non-FS, VM, or paging path code related panic), they can clean in the background, since the "uncleanliness" will be detectable overallocations (in soft updates, the cylinder group bitmaps will have "allocated" bits falsely set, which can be cleaned by locking access on a per cylinder group basis, and running in the background, with little or no system impact, depending on access locality while the cleaner is touching a particular cylinder group). Really, journalling and soft updates should be considered complementary technologies (e.g. soft updates prevents disk accesses, which, if your system is IDE based, will otherwise have to occur serially, and thus slow down accesses; this is not usually a problem with JFS, since "big iron" generally runs SCSI disks, anyway). But they both fail to deal adequately with unexpected hard-recovery requiring failures. As a final note, intention logging is antithetical to Soft Updates, vbut is required if you want to be able to roll interrupted transactions forward on recovery. The reason it doesn't mix very well is that it requires writing the intention to stable storage, and then making it "active" at the end of the complete transaction. -- Terry To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-chat" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?3C1A6812.A0EFCB73>