From owner-freebsd-scsi  Fri Jan 10 13:39:47 1997
Return-Path: <owner-freebsd-scsi>
Received: (from root@localhost)
          by freefall.freebsd.org (8.8.4/8.8.4) id NAA04979
          for freebsd-scsi-outgoing; Fri, 10 Jan 1997 13:39:47 -0800 (PST)
Received: from Sisyphos.MI.Uni-Koeln.DE (Sisyphos.MI.Uni-Koeln.DE [134.95.212.10])
          by freefall.freebsd.org (8.8.4/8.8.4) with SMTP id NAA04973;
          Fri, 10 Jan 1997 13:39:34 -0800 (PST)
Received: from x14.mi.uni-koeln.de (annexr2-43.slip.Uni-Koeln.DE) by Sisyphos.MI.Uni-Koeln.DE with SMTP id AA01723
  (5.67b/IDA-1.5); Fri, 10 Jan 1997 22:39:19 +0100
Received: (from se@localhost) by x14.mi.uni-koeln.de (8.8.4/8.6.9) id WAA06293; Fri, 10 Jan 1997 22:39:11 +0100 (CET)
Message-Id: <Mutt.19970110223911.se@x14.mi.uni-koeln.de>
Date: Fri, 10 Jan 1997 22:39:11 +0100
From: se@freebsd.org (Stefan Esser)
To: kelly@fsl.noaa.gov (Sean Kelly)
Cc: se@freebsd.org (Stefan Esser), scsi@freebsd.org
Subject: Re: Problem appears from  migration from bt0 to ncr0
References: <32D125EB.4E3F@fsl.noaa.gov> <Mutt.19970106205850.se@x14.mi.uni-koeln.de> <32D15EA6.47A0@fsl.noaa.gov> <Mutt.19970106212906.se@x14.mi.uni-koeln.de> <32D3C8B1.EDA@fsl.noaa.gov>
X-Mailer: Mutt 0.55-PL15
Mime-Version: 1.0
In-Reply-To: <32D3C8B1.EDA@fsl.noaa.gov>; from Sean Kelly on Jan 8, 1997 09:17:53 -0700
Sender: owner-freebsd-scsi@freebsd.org
X-Loop: FreeBSD.org
Precedence: bulk

On Jan 8, kelly@fsl.noaa.gov (Sean Kelly) wrote:
> Stefan:
> 
> Here're the results so far with the DFRS drive of doom.  
> 
> First, with the system running 2.1.5R, I modified /sys/scsi/sd.c and
> changed xs->timeout from 10000 to 80000.  Later that evening, the DFRS
> went to sleep but the same problem occurred as before.  Perhaps I
> should've used a higher value?  (And if what I changed wasn't correct,
> do let me know.)

Guess you were right. You can try with SCSI debug 
enabled, see /sys/scsi/scsi_debug.h ...
Then you'll know which command is aborted. But this
will significantly slow down your system. The "scsi"
command allows to set the debug level, if debug code
is compiled into the kernel. You'll need lots of disk 
space for the log messages, though (and you should
write them to some other drive ...)

Sorry, I can't spend much time on this problem,
I just don't HAVE that time :(

> Next, I upgraded to 2.2-BETA, and ran the system with an unmodified
> sd.c.  This time, the system locked up (!) when the DFRS went to sleep. 
> The console displayed:
> 
> ncr0: SCSI phase error fixup: CCB already dequeued (...)

Yes, this is the result of the NCR being locked 
by the unresponsive drive for too long. I should
make the recovery more robust, but I do not have
a drive that exhibits such a problem, and it is
not so easy to reproduce and test, on my system.
(I do not want to loose any of my work because of
such a test, and do not want to buy a DFRS just 
to have a drive to play with ...)

> and the scrollback buffer worked, and I could enter my account name at
> the login prompt.  But nothing else; as if any activity requiring the
> disk just hung.  I hit the reset switch.

This is not an acceptable outcome. But I can't do
much about it, currently. Sorry ...

> I still need to try changing the xs->timeout in the 2.2-BETA kernel and
> forcing the tape drives to be async in /etc/rc.local.

You may try with all devices async, since one of 
the inconsistencies that might have been caused
by the controller reset are in the transfer rates
assumed by both sides.

Regards, STefan