Date: Tue, 13 Oct 1998 00:59:05 -0600 From: "Justin T. Gibbs" <gibbs@plutotech.com> To: Terry Lambert <tlambert@primenet.com> Cc: gibbs@plutotech.com (Justin T. Gibbs), Don.Lewis@tsc.tdk.com, julian@whistle.com, freebsd-fs@FreeBSD.ORG, freebsd-scsi@FreeBSD.ORG Subject: Re: filesystem safety and SCSI disk write caching Message-ID: <199810130705.BAA12205@pluto.plutotech.com> In-Reply-To: Your message of "Mon, 12 Oct 1998 22:58:15 -0000." <199810122258.PAA11377@usr02.primenet.com>
next in thread | previous in thread | raw e-mail | index | archive | help
>> >} 2) Use a drive with non-bogus firmware. Recent Seagate and IBM >> >} drives should work just fine. I haven't validated any Quantum >> >} drives in this regard yet. >> > >> >But how can tell if the firmware is non-bogus? >> >> Ask Terry since he has stated that he 'doesn't have any drives with >> non-bogus firmware'. > >A) Run soft updates >B) Press "reset" occasionally >C) Note any anomalies in the resulting fsck when the machine > comes back up >D) if count < 200, goto B >E) if # of anomalies > 0, print "bad firmware". You're missing a large step here. You can't prove that the 'anomaly' is related to the drive firmware without a trace of all transactions on the SCSI bus. It could well be a missing dependency in the soft update code. I'd be more than happy to reproduce your failure scenario while recording a SCSI bus trace so that the fault is easy to interpret. Just send me any *modern* drive that you think fails. You should also ensure that your reset button does not cause any power spikes on the drive power lines. That would be cheating. >It's very hard to do this in software, without providing a mechanism >to actually break into the latency link between the drive reporting >a write cached operation has been written, and the actual writing. If you can cause this a failure to occur by hitting your reset button, I should be able to cause it to occur by using a paper-clip if the reset condition (cased by the SCSI card BIOS in the reset button case) is the event that causes cache corruption. Both are non-deterministic methods of error injection. >Such a latency link only exists on drives which Justin has identified >as having broken firmware due to the behaviour reported by Don Lewis. I'm still unclear as to whether Don was turning off power or hitting what I consider the reset button. His comment about UPSes use makes me think he was testing power outage scenarios. >I would be much more interested in knowing what drives and firmware >revisions of those drives Justin has, since both mine and Don Lewis's >are demonstrably broken. Since you were able to test 4 drives so quickly, I'd love to see well documented information on exactly how the file system was inconsistent in the failure cases. -- Justin To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199810130705.BAA12205>