From owner-freebsd-scsi Mon Oct 12 15:59:10 1998 Return-Path: Received: (from majordom@localhost) by hub.freebsd.org (8.8.8/8.8.8) id PAA04001 for freebsd-scsi-outgoing; Mon, 12 Oct 1998 15:59:10 -0700 (PDT) (envelope-from owner-freebsd-scsi@FreeBSD.ORG) Received: from smtp02.primenet.com (smtp02.primenet.com [206.165.6.132]) by hub.freebsd.org (8.8.8/8.8.8) with ESMTP id PAA03916; Mon, 12 Oct 1998 15:58:48 -0700 (PDT) (envelope-from tlambert@usr02.primenet.com) Received: (from daemon@localhost) by smtp02.primenet.com (8.8.8/8.8.8) id PAA12815; Mon, 12 Oct 1998 15:58:30 -0700 (MST) Received: from usr02.primenet.com(206.165.6.202) via SMTP by smtp02.primenet.com, id smtpd012733; Mon Oct 12 15:58:20 1998 Received: (from tlambert@localhost) by usr02.primenet.com (8.8.5/8.8.5) id PAA11377; Mon, 12 Oct 1998 15:58:15 -0700 (MST) From: Terry Lambert Message-Id: <199810122258.PAA11377@usr02.primenet.com> Subject: Re: filesystem safety and SCSI disk write caching To: gibbs@plutotech.com (Justin T. Gibbs) Date: Mon, 12 Oct 1998 22:58:15 +0000 (GMT) Cc: Don.Lewis@tsc.tdk.com, gibbs@plutotech.com, tlambert@primenet.com, julian@whistle.com, freebsd-fs@FreeBSD.ORG, freebsd-scsi@FreeBSD.ORG In-Reply-To: <199810121557.JAA04320@pluto.plutotech.com> from "Justin T. Gibbs" at Oct 12, 98 09:50:42 am X-Mailer: ELM [version 2.4 PL25] MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: owner-freebsd-scsi@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org > >} 2) Use a drive with non-bogus firmware. Recent Seagate and IBM > >} drives should work just fine. I haven't validated any Quantum > >} drives in this regard yet. > > > >But how can tell if the firmware is non-bogus? > > Ask Terry since he has stated that he 'doesn't have any drives with > non-bogus firmware'. A) Run soft updates B) Press "reset" occasionally C) Note any anomalies in the resulting fsck when the machine comes back up D) if count < 200, goto B E) if # of anomalies > 0, print "bad firmware". > Seriously, the major complaint I've heard about firmware has to do with it > not properly flushing the cache on a bus reset. I've never seen that > failure mode here, and I've done quite a bit of "external bus reset" > testing. You'll need sophisticated tools in order to perform these kinds > of tests: > > 1) Find a paper clip > > 2) Find a ribbon cable that has enough connectors to attach to the device > you want to test and the controller with a connector spare. > > 3) Start lots of writes > > 4) Ground pin 40 to pin 39 using the paper clip from step 1. > > 5) Verify data > > 6) goto 3 It's very hard to do this in software, without providing a mechanism to actually break into the latency link between the derive reporting a write cached operation has been written, and the actual writing. Such a latency link only exists on drives which Justin has identified as having broken firmware due to the behaviour reported by Don Lewis. I would be much more interested in knowing what drives and firmware revisions of those drives Justin has, since both mine and Don Lewis's are demonstrably broken. The drives I can demonstrate breakage on are a 9G IBM drive, a 2.1G Quantum drive, and two .5G DEC drives. I can get exact model numbers if necessary, but it seems to me, from empirical evidence so far, that the number of "broken firmware" drives, as defined by Justin, outnumber the number of non-broken firmware drives. This means that a "known good" table would be smaller than a "known rogues" table, and thus a better mechanism for implementing the decision about whether write caching should be enabled on the drive, or not. I'll be happy (well, actually I won't) to have it proven to me that my and Don Lewis' drives are the exceptions, rather than the rule. 8-(. Terry Lambert terry@lambert.org --- Any opinions in this posting are my own and not those of my present or previous employers. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-scsi" in the body of the message