Date: Sat, 19 Sep 1998 17:19:51 +0000 (GMT) From: Terry Lambert <tlambert@primenet.com> To: eivind@yes.no (Eivind Eklund) Cc: tlambert@primenet.com, Don.Lewis@tsc.tdk.com, current@FreeBSD.ORG Subject: Re: softupdates & fsck Message-ID: <199809191719.KAA10028@usr09.primenet.com> In-Reply-To: <19980919123143.36373@follo.net> from "Eivind Eklund" at Sep 19, 98 12:31:43 pm
next in thread | previous in thread | raw e-mail | index | archive | help
> > That you are seeing these problems implies that the bwrite ordering > > guarantees that the driver must provide (i.e., that the blocks will > > be written in the order requested, and that the writes will not > > return as completed until the data has been committed to the disk) > > are not being honored. > > Given that most drives don't honour these guarantees [1] it may happen > even without a problem with the driver. > > [1] This marks the point where somebody comes runing, waving standards > documents and becoming more and more red in the face, while I say > "Yes, I know they say the drives are supposed to - but in fact, the > drives don't actually *do* what they're supposed to." They do if you set their options correctly and insure a holdup time after power failure, during which you will not engage in scheduling new writes. The question is "what happens to the sector under the head during a write in case of power failure, if you don't have a holdup time?". For some drives, the answer is "it works". For the drives that idiots buy, the answer is "it gets corrupted and the data is not recoverable at all". In any case, we are talking about system resets, not power failures, so this is somewhat a horse of a different wheelbase, and we can ignore the case where you employ idiots to buy your hardware and/or you use non-ATX power supplies, followed by the power going out unexpectedly. For a drive that isn't powered down, that has stated to the controller that it has written a block that it has actually cached, it is the responsibility of the drive to write what it said it did before acting upon the reset signal. I can tell you that Quantum and Seagate IDE drives honor this, and that most (all?) SCSI drives honor this even better (not returning that the queued command has completed until the data is committed to disk). I expect that since (1) this problem doesn't occur without CAM, and (2) this problem occurs with CAM, that this problem is CAM related. Feel free to prove me wrong by duplicating the problem with a pre-CAM kernel with Loqui's patch applied; I'll be happy of the stack traceback, as I'm sure Julian and Kirk would be, as well. For now, from the fsck failure, it looks like the CAM driver isn't making the ordering guarantees it should, and stating that "some hardware won't make this guarantees, either", is more an argument against purchasing "some hardware" than it is an argument for CAM ignoring the intentional ordering of requests (if that's the root cause of the problem, which didn't occur pre-CAM; obviously it could be a different CAM bug causing the problem...). A good question to ask at this point is "is anyone with an IDE drive experiencing this problem?". Terry Lambert terry@lambert.org --- Any opinions in this posting are my own and not those of my present or previous employers. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-current" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199809191719.KAA10028>