Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 10 Jan 1997 22:39:11 +0100
From:      se@freebsd.org (Stefan Esser)
To:        kelly@fsl.noaa.gov (Sean Kelly)
Cc:        se@freebsd.org (Stefan Esser), scsi@freebsd.org
Subject:   Re: Problem appears from  migration from bt0 to ncr0
Message-ID:  <Mutt.19970110223911.se@x14.mi.uni-koeln.de>
In-Reply-To: <32D3C8B1.EDA@fsl.noaa.gov>; from Sean Kelly on Jan 8, 1997 09:17:53 -0700
References:  <32D125EB.4E3F@fsl.noaa.gov> <Mutt.19970106205850.se@x14.mi.uni-koeln.de> <32D15EA6.47A0@fsl.noaa.gov> <Mutt.19970106212906.se@x14.mi.uni-koeln.de> <32D3C8B1.EDA@fsl.noaa.gov>

next in thread | previous in thread | raw e-mail | index | archive | help
On Jan 8, kelly@fsl.noaa.gov (Sean Kelly) wrote:
> Stefan:
> 
> Here're the results so far with the DFRS drive of doom.  
> 
> First, with the system running 2.1.5R, I modified /sys/scsi/sd.c and
> changed xs->timeout from 10000 to 80000.  Later that evening, the DFRS
> went to sleep but the same problem occurred as before.  Perhaps I
> should've used a higher value?  (And if what I changed wasn't correct,
> do let me know.)

Guess you were right. You can try with SCSI debug 
enabled, see /sys/scsi/scsi_debug.h ...
Then you'll know which command is aborted. But this
will significantly slow down your system. The "scsi"
command allows to set the debug level, if debug code
is compiled into the kernel. You'll need lots of disk 
space for the log messages, though (and you should
write them to some other drive ...)

Sorry, I can't spend much time on this problem,
I just don't HAVE that time :(

> Next, I upgraded to 2.2-BETA, and ran the system with an unmodified
> sd.c.  This time, the system locked up (!) when the DFRS went to sleep. 
> The console displayed:
> 
> ncr0: SCSI phase error fixup: CCB already dequeued (...)

Yes, this is the result of the NCR being locked 
by the unresponsive drive for too long. I should
make the recovery more robust, but I do not have
a drive that exhibits such a problem, and it is
not so easy to reproduce and test, on my system.
(I do not want to loose any of my work because of
such a test, and do not want to buy a DFRS just 
to have a drive to play with ...)

> and the scrollback buffer worked, and I could enter my account name at
> the login prompt.  But nothing else; as if any activity requiring the
> disk just hung.  I hit the reset switch.

This is not an acceptable outcome. But I can't do
much about it, currently. Sorry ...

> I still need to try changing the xs->timeout in the 2.2-BETA kernel and
> forcing the tape drives to be async in /etc/rc.local.

You may try with all devices async, since one of 
the inconsistencies that might have been caused
by the controller reset are in the transfer rates
assumed by both sides.

Regards, STefan



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Mutt.19970110223911.se>