Date: Tue, 14 Dec 1999 19:18:51 +0000 (GMT) From: Terry Lambert <tlambert@primenet.com> To: noslenj@swbell.net (Jay Nelson) Cc: tlambert@primenet.com, chat@FreeBSD.ORG Subject: Re: Log file systems? (Was: Re: dual 400 -> dual 600 worth it?) Message-ID: <199912141919.MAA20684@usr02.primenet.com> In-Reply-To: <Pine.BSF.4.05.9912132046590.782-100000@acp.swbell.net> from "Jay Nelson" at Dec 13, 99 09:37:16 pm
next in thread | previous in thread | raw e-mail | index | archive | help
> >They are FAQs, not "in the FAQ". > > I suspect they probably should be in the FAQ. The average admin who > doesn't follow mailing lists asks questions like this. The more we > claim (justifiably) stability, the more seriously they evaluate > FreeBSD against commercial alternatives. This is an area where few of > us really understand the issues involved. > > >The archives you should be looking at, and the place you should be > >asking the question are the freebsd-fs list. > > I did look in the fs archives -- although I'm not sure the general > question belongs there since it seems to have more to do with the > differences between FreeBSD and the commercal offerings. > > Is it fair to summarize the differences as: > > Soft updates provide little in terms of recovering data, but enhances > performance during runtime. Recovery being limited to ignoring > metadata that wasn't written to disk. No. Soft updates: What is lost are uncommitted writes. Committed writes are guaranteed to have been ordered. This means that you can deterministically recover the disk not just to a stable state, but to the stable state that it was intended to be in. The things that are lost are implied state between files (e.g. a record file and an index file for a database); this can be worked around using two stage commits on the data in the database software. Soft updates is slow to recover because of the need to tell the difference between a hard failure and a soft failure (a hard failure is a software or hardware fault; a soft failure is the loss of power). If you can tell this, then you don't need to fsck the drive, only recover over-allocated cylinder group bitmaps. This can be done in the background, locking access to a cylinder group at a time. Distinguishing the failure type is the biggest problem here, and requires NVRAM or a technology like soft read-only (first implemented by a team I was on at Artisoft around 1996 for a port of the Heidemann framework and soft updates to Windows 95, as far as I can tell). > Log file systems offers little data recovery in return for faster > system recovery after an unorderly halt at the cost of a runtime > penalty. Log structured FSs: Zero rotational latency on writes, fast recovery after a hard or soft failure. What is lost are uncommitted writes (see above). LFSs recover quickly because they look for the metadata log entry with the most recent date, and they are "magically" recovered to that point. There is still a catch-22 with regard to soft vs. hard failures, but most hard failures can be safely ignored, since any data dated from before the hard failure is OK, unless the drive is going south. You must therefore differentiate hard failures in the kernel "panic" messages, so that a human has an opportunity to see them. LFSs have an ongoing runtime cost that is effectively the need to "garbage collect" outdated logs so that their extents can be reused by new data. > Journaled filesystem offer the potential of data recovery at a boot > time and runtime cost. JFSs: A JFS maintains a Journal; this is sometimes called an intention log. Because it logs its intent before the fact, it can offer a transactional interface to user space. This lets the programmer skip the more expensive two stage commit process in favor of hooks into the intention log. Because transactions done this way can be nested, a completed but uncommitted transaction can be rolled forward to the extent that the nesting level has returned to "0" -- in other words, all nested transation intents have been logged. Because transactions can be rolled forward, you will recover to the state that the JFS would have been in had the failure not ever occurred. This works, because writes, etc., are not acknowledged back to the caller until the intention has been carried out. Things like an intent to delete a file, rename a file, etc. are logged at level 0 (i.e. not in a user defined transaction bound), and so can be acknowledged immediately; wirtes of actual data need to be delayed, if they are in a transaction bound. This lets you treat a JFS as a committed stable storage, without second-guessing the kernel or the drive cache, etc.. A JFS recovery, like an LFS recovery, uses the most recent valid timestamp in the intention log, and then rules all transactions that have completed forward. Like LFS, hard errors can be ignored, unless the hard errors occur during replay of the journal in rolling some completed transaction forward. Because of this, care must be taken on recovery. JFS recovery can take a while, if there are a lot of completed intentions in the journal. Many JFS implementations also use logs in order to write user data, so that the write acknowledge can be accelerated. > I know this is disgustingly over simplified, but about all you can get > through to typical management. > > I also have to admit, I'm a little confused with your usage of the > word orthogonal. Do you mean that an orthogonal technology projects > cleanly or uniformly into different dimensions of system space? Yes. "Mutually perpendicular" and "Intersecting at only one point". It's my training in physics seeping through... Terry Lambert terry@lambert.org --- Any opinions in this posting are my own and not those of my present or previous employers. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-chat" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199912141919.MAA20684>