From owner-freebsd-fs Tue Oct 13 00:06:13 1998 Return-Path: Received: (from majordom@localhost) by hub.freebsd.org (8.8.8/8.8.8) id AAA11466 for freebsd-fs-outgoing; Tue, 13 Oct 1998 00:06:13 -0700 (PDT) (envelope-from owner-freebsd-fs@FreeBSD.ORG) Received: from pluto.plutotech.com (mail.plutotech.com [206.168.67.137]) by hub.freebsd.org (8.8.8/8.8.8) with ESMTP id AAA11449; Tue, 13 Oct 1998 00:06:09 -0700 (PDT) (envelope-from gibbs@plutotech.com) Received: from narnia.plutotech.com (narnia.plutotech.com [206.168.67.130]) by pluto.plutotech.com (8.8.7/8.8.5) with ESMTP id BAA12205; Tue, 13 Oct 1998 01:05:50 -0600 (MDT) Message-Id: <199810130705.BAA12205@pluto.plutotech.com> X-Mailer: exmh version 2.0.2 2/24/98 To: Terry Lambert cc: gibbs@plutotech.com (Justin T. Gibbs), Don.Lewis@tsc.tdk.com, julian@whistle.com, freebsd-fs@FreeBSD.ORG, freebsd-scsi@FreeBSD.ORG Subject: Re: filesystem safety and SCSI disk write caching In-reply-to: Your message of "Mon, 12 Oct 1998 22:58:15 -0000." <199810122258.PAA11377@usr02.primenet.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Date: Tue, 13 Oct 1998 00:59:05 -0600 From: "Justin T. Gibbs" Sender: owner-freebsd-fs@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org >> >} 2) Use a drive with non-bogus firmware. Recent Seagate and IBM >> >} drives should work just fine. I haven't validated any Quantum >> >} drives in this regard yet. >> > >> >But how can tell if the firmware is non-bogus? >> >> Ask Terry since he has stated that he 'doesn't have any drives with >> non-bogus firmware'. > >A) Run soft updates >B) Press "reset" occasionally >C) Note any anomalies in the resulting fsck when the machine > comes back up >D) if count < 200, goto B >E) if # of anomalies > 0, print "bad firmware". You're missing a large step here. You can't prove that the 'anomaly' is related to the drive firmware without a trace of all transactions on the SCSI bus. It could well be a missing dependency in the soft update code. I'd be more than happy to reproduce your failure scenario while recording a SCSI bus trace so that the fault is easy to interpret. Just send me any *modern* drive that you think fails. You should also ensure that your reset button does not cause any power spikes on the drive power lines. That would be cheating. >It's very hard to do this in software, without providing a mechanism >to actually break into the latency link between the drive reporting >a write cached operation has been written, and the actual writing. If you can cause this a failure to occur by hitting your reset button, I should be able to cause it to occur by using a paper-clip if the reset condition (cased by the SCSI card BIOS in the reset button case) is the event that causes cache corruption. Both are non-deterministic methods of error injection. >Such a latency link only exists on drives which Justin has identified >as having broken firmware due to the behaviour reported by Don Lewis. I'm still unclear as to whether Don was turning off power or hitting what I consider the reset button. His comment about UPSes use makes me think he was testing power outage scenarios. >I would be much more interested in knowing what drives and firmware >revisions of those drives Justin has, since both mine and Don Lewis's >are demonstrably broken. Since you were able to test 4 drives so quickly, I'd love to see well documented information on exactly how the file system was inconsistent in the failure cases. -- Justin To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message