Date: Fri, 12 Nov 1999 21:05:25 -0800 (PST) From: Matthew Jacob <mjacob@feral.com> To: Bill Fumerola <billf@chc-chimes.com> Cc: billf@FreeBSD.ORG, billik@sun.uniag.sk, freebsd-bugs@FreeBSD.ORG Subject: Re: misc/11216: Power fail versus Fsck changed my life. Message-ID: <Pine.BSF.4.10.9911122045560.55474-100000@beppo.feral.com> In-Reply-To: <Pine.BSF.4.10.9911122238400.90447-100000@jade.chc-chimes.com>
next in thread | previous in thread | raw e-mail | index | archive | help
> On Fri, 12 Nov 1999, Matthew Jacob wrote: > > > > On a side note, I don't know how many operating systems are > > > out there that can recorver from ripping the hard drive cable > > > while a drive is operational and transfering. > > > > Any decent transaction engine. > > And not lose the data being transfered? Absolutely. The whole point of transaction engines with failover hardware, or for decent h/w and software in general, is that it is never ambiguous as to whether data has made it to stable medium or not. Actually, this applies to all data paths that have any kind of error checking- has the data made it from point A to point B safely. If not, execute the current policy as to what to do when the data has not made it intact. The policy choices are many, but they ultimately boil down to *at each level of the data transfer* of either committing to retry the operation until it succeeds, retrying for a certain stated number of retries, or a certain bounded time interval until reporting failure back up to the next entity in a chain of entities that is moving the data. The original submitter of the data ultimately then chooses what to do- if it's a low cost VGA card that didn't accept a pixel write, well, fine, ignore it. If it is your bank's ATM, it does not take the transaction any further, etc. The ordering of events are clearly important to this- i.e., do *not* disburse the money *until* the transaction has successfully logged. Insofar as a middling level of survivability, SunOS was middling good (if I do say so myself) example of just the case of kicking the cable loose- the SCSI subsystem would retry a wad o' times (for relatively stateless devices like disks), bitching all the way, then it would reflect back to the page/buffer cache code, which would do it's damndest to not lose the dirty but now not yet backed page(s). Very very very rarely would it ever panic on an I/O error. It would panic on UFS inconsistencies, true, but, frankly, that was because a research filesystem was being used in a semi- to full- commercial context. I remember working with a really *great* filesystem hacker from TRW quite some time ago (like, 20 years ago)- and the things he had to say about the Unix filesystem, while perhaps a bit unfair, were certainly on point (for the time- it's a helluva lot better now) and an eyeopener about what levels of expectations for consistency and survivability are. The current FreeBSD da driver policy of pack invalidation on I/O error is probably *not* a good example of surviving external events. Note if one were to rewrite the statement that started this all off to be instead: "I don't know how many free operating systems are out there that can recover from...", I probably wouldn't have said anything. Care and caution is expensive. So's carelessness and foolish marketing hype, but perhaps that's too cynical and I'm needing to chill with some semi-controlled substances now...ciao'... -matt To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-bugs" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.BSF.4.10.9911122045560.55474-100000>