Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 18 Jul 1995 21:21:09 -0400 (EDT)
From:      Peter Dufault <dufault@hda.com>
To:        root@freefall.cdrom.com (& freefall.cdrom.com)
Cc:        julian@ref.tfs.com, stratlif@grail.cba.csuohio.edu, freebsd-hackers@FreeBSD.org
Subject:   Re: SCSI drivers
Message-ID:  <199507190121.VAA00365@hda.com>
In-Reply-To: <199507182014.NAA11391@freefall.cdrom.com> from "& freefall.cdrom.com" at Jul 18, 95 01:14:49 pm

next in thread | previous in thread | raw e-mail | index | archive | help
& freefall.cdrom.com writes:
...

(I guess that is Justin's new name; I know he went to a wedding
but I didn't think it was his)

I started to prepare a response to this, and seeing the that
there is some interest I'll fire away.  In particular, anybody thinking
about scsi code should think of how we can properly layer it so that
the policy is dictated from above, and where that breaks down.

Here it is:
> 
> o Error Recovery
> 
>   This driver implements extensive error recovery procedures.  When the
>   higher level parts of the SCSI subsystem request that a command be reset,
>   a bus device reset is first sent to the target device.  If two bus device
>   resets have been attempted and no command to the device has completed
>   successfully, then a host adapter hard reset and SCSI bus reset is
>   performed.  SCSI bus resets caused by other devices and detected by the
>   host adapter are also handled by issuing a hard reset to the host adapter
>   and full reinitialization.  This strategy should improve overall system
>   robustness by preventing individual errant devices from causing the
>   system as a whole to lockup or crash, and thereby allowing a clean
>   shutdown and restart after the offending component is removed.

I have one overall comment which is that policy should be handled
at the common level and not the lower ones.  If the policy on device
hangups is "after two bus device resets reset the device, then reset
the board, and then reset the bus" then it should be driven by calls down from
the common code and not decided upon in a single low level driver.

Another observation is that you'll have outstanding work going on
on the bus.  You'll have to resubmit these transactions after
resetting the SCSI bus.  Some transactions will not make sense to
resubmit, such as an aborted tape write.

Of course this could be a proof of concept that will then be implemented
in a more uniform fashion, and the author may have addressed these
issues.

I sent out a summary of an error
strategy a little while ago, and the consensus was that it was 2.2 material
because of the changes involved.  It included suspending the activity
on the scsi bus to let as much I/O as possible drain before resetting
the bus and trying to pick things up again.

-- 
Peter Dufault               Real Time Machine Control and Simulation
HD Associates, Inc.         Voice: 508 433 6936
dufault@hda.com             Fax:   508 433 5267



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199507190121.VAA00365>