Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 12 Oct 1998 22:58:15 +0000 (GMT)
From:      Terry Lambert <tlambert@primenet.com>
To:        gibbs@plutotech.com (Justin T. Gibbs)
Cc:        Don.Lewis@tsc.tdk.com, gibbs@plutotech.com, tlambert@primenet.com, julian@whistle.com, freebsd-fs@FreeBSD.ORG, freebsd-scsi@FreeBSD.ORG
Subject:   Re: filesystem safety and SCSI disk write caching
Message-ID:  <199810122258.PAA11377@usr02.primenet.com>
In-Reply-To: <199810121557.JAA04320@pluto.plutotech.com> from "Justin T. Gibbs" at Oct 12, 98 09:50:42 am

next in thread | previous in thread | raw e-mail | index | archive | help
> >} 2) Use a drive with non-bogus firmware.  Recent Seagate and IBM
> >} drives should work just fine.  I haven't validated any Quantum
> >} drives in this regard yet.
> >
> >But how can tell if the firmware is non-bogus?
> 
> Ask Terry since he has stated that he 'doesn't have any drives with
> non-bogus firmware'.

A)	Run soft updates
B)	Press "reset" occasionally
C)	Note any anomalies in the resulting fsck when the machine
	comes back up
D)	if count < 200, goto B
E)	if # of anomalies > 0, print "bad firmware".


> Seriously, the major complaint I've heard about firmware has to do with it
> not properly flushing the cache on a bus reset.  I've never seen that
> failure mode here, and I've done quite a bit of "external bus reset"
> testing.  You'll need sophisticated tools in order to perform these kinds
> of tests:
> 
> 1) Find a paper clip
> 
> 2) Find a ribbon cable that has enough connectors to attach to the device
> you want to test and the controller with a connector spare.
> 
> 3) Start lots of writes
> 
> 4) Ground pin 40 to pin 39 using the paper clip from step 1.
> 
> 5) Verify data
> 
> 6) goto 3

It's very hard to do this in software, without providing a mechanism
to actually break into the latency link between the derive reporting
a write cached operation has been written, and the actual writing.

Such a latency link only exists on drives which Justin has identified
as having broken firmware due to the behaviour reported by Don Lewis.

I would be much more interested in knowing what drives and firmware
revisions of those drives Justin has, since both mine and Don Lewis's
are demonstrably broken.

The drives I can demonstrate breakage on are a 9G IBM drive, a 2.1G
Quantum drive, and two .5G DEC drives.  I can get exact model numbers
if necessary, but it seems to me, from empirical evidence so far, that
the number of "broken firmware" drives, as defined by Justin, outnumber
the number of non-broken firmware drives.  This means that a "known good"
table would be smaller than a "known rogues" table, and thus a better
mechanism for implementing the decision about whether write caching
should be enabled on the drive, or not.

I'll be happy (well, actually I won't) to have it proven to me that
my and Don Lewis' drives are the exceptions, rather than the rule.
8-(.


					Terry Lambert
					terry@lambert.org
---
Any opinions in this posting are my own and not those of my present
or previous employers.

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-scsi" in the body of the message



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199810122258.PAA11377>