Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 12 Nov 1999 21:05:25 -0800 (PST)
From:      Matthew Jacob <mjacob@feral.com>
To:        Bill Fumerola <billf@chc-chimes.com>
Cc:        billf@FreeBSD.ORG, billik@sun.uniag.sk, freebsd-bugs@FreeBSD.ORG
Subject:   Re: misc/11216: Power fail versus Fsck changed my life.
Message-ID:  <Pine.BSF.4.10.9911122045560.55474-100000@beppo.feral.com>
In-Reply-To: <Pine.BSF.4.10.9911122238400.90447-100000@jade.chc-chimes.com>

next in thread | previous in thread | raw e-mail | index | archive | help


> On Fri, 12 Nov 1999, Matthew Jacob wrote:
> 
> > > On a side note, I don't know how many operating systems are
> > > out there that can recorver from ripping the hard drive cable
> > > while a drive is operational and transfering.
> > 
> > Any decent transaction engine.
> 
> And not lose the data being transfered?

Absolutely. The whole point of transaction engines with failover hardware,
or for decent h/w and software in general, is that it is never ambiguous
as to whether data has made it to stable medium or not. Actually, this
applies to all data paths that have any kind of error checking- has the
data made it from point A to point B safely. If not, execute the current
policy as to what to do when the data has not made it intact.

The policy choices are many, but they ultimately boil down to *at each
level of the data transfer* of either committing to retry the operation
until it succeeds, retrying for a certain stated number of retries, or a
certain bounded time interval until reporting failure back up to the next
entity in a chain of entities that is moving the data. The original
submitter of the data ultimately then chooses what to do- if it's a low
cost VGA card that didn't accept a pixel write, well, fine, ignore it. If
it is your bank's ATM, it does not take the transaction any further, etc.
The ordering of events are clearly important to this- i.e., do *not*
disburse the money *until* the transaction has successfully logged.

Insofar as a middling level of survivability, SunOS was middling
good (if I do say so myself) example of just the case of kicking the cable
loose- the SCSI subsystem would retry a wad o' times (for relatively
stateless devices like disks), bitching all the way, then it would reflect
back to the page/buffer cache code, which would do it's damndest to not
lose the dirty but now not yet backed page(s). Very very very rarely would
it ever panic on an I/O error. It would panic on UFS inconsistencies,
true, but, frankly, that was because a research filesystem was being used
in a semi- to full- commercial context. I remember working with a really
*great* filesystem hacker from TRW quite some time ago (like, 20 years
ago)- and the things he had to say about the Unix filesystem, while
perhaps a bit unfair, were certainly on point (for the time- it's a
helluva lot better now) and an eyeopener about what levels of expectations
for consistency and survivability are.

The current FreeBSD da driver policy of pack invalidation on I/O error is
probably *not* a good example of surviving external events.

Note if one were to rewrite the statement that started this all off to be
instead: "I don't know how many free operating systems are out there that
can recover from...", I probably wouldn't have said anything. Care and
caution is expensive. So's carelessness and foolish marketing hype, but
perhaps that's too cynical and I'm needing to chill with some
semi-controlled substances now...ciao'...

-matt




To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-bugs" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.BSF.4.10.9911122045560.55474-100000>