From owner-freebsd-scsi Fri Jan 10 13:39:47 1997 Return-Path: Received: (from root@localhost) by freefall.freebsd.org (8.8.4/8.8.4) id NAA04979 for freebsd-scsi-outgoing; Fri, 10 Jan 1997 13:39:47 -0800 (PST) Received: from Sisyphos.MI.Uni-Koeln.DE (Sisyphos.MI.Uni-Koeln.DE [134.95.212.10]) by freefall.freebsd.org (8.8.4/8.8.4) with SMTP id NAA04973; Fri, 10 Jan 1997 13:39:34 -0800 (PST) Received: from x14.mi.uni-koeln.de (annexr2-43.slip.Uni-Koeln.DE) by Sisyphos.MI.Uni-Koeln.DE with SMTP id AA01723 (5.67b/IDA-1.5); Fri, 10 Jan 1997 22:39:19 +0100 Received: (from se@localhost) by x14.mi.uni-koeln.de (8.8.4/8.6.9) id WAA06293; Fri, 10 Jan 1997 22:39:11 +0100 (CET) Message-Id: Date: Fri, 10 Jan 1997 22:39:11 +0100 From: se@freebsd.org (Stefan Esser) To: kelly@fsl.noaa.gov (Sean Kelly) Cc: se@freebsd.org (Stefan Esser), scsi@freebsd.org Subject: Re: Problem appears from migration from bt0 to ncr0 References: <32D125EB.4E3F@fsl.noaa.gov> <32D15EA6.47A0@fsl.noaa.gov> <32D3C8B1.EDA@fsl.noaa.gov> X-Mailer: Mutt 0.55-PL15 Mime-Version: 1.0 In-Reply-To: <32D3C8B1.EDA@fsl.noaa.gov>; from Sean Kelly on Jan 8, 1997 09:17:53 -0700 Sender: owner-freebsd-scsi@freebsd.org X-Loop: FreeBSD.org Precedence: bulk On Jan 8, kelly@fsl.noaa.gov (Sean Kelly) wrote: > Stefan: > > Here're the results so far with the DFRS drive of doom. > > First, with the system running 2.1.5R, I modified /sys/scsi/sd.c and > changed xs->timeout from 10000 to 80000. Later that evening, the DFRS > went to sleep but the same problem occurred as before. Perhaps I > should've used a higher value? (And if what I changed wasn't correct, > do let me know.) Guess you were right. You can try with SCSI debug enabled, see /sys/scsi/scsi_debug.h ... Then you'll know which command is aborted. But this will significantly slow down your system. The "scsi" command allows to set the debug level, if debug code is compiled into the kernel. You'll need lots of disk space for the log messages, though (and you should write them to some other drive ...) Sorry, I can't spend much time on this problem, I just don't HAVE that time :( > Next, I upgraded to 2.2-BETA, and ran the system with an unmodified > sd.c. This time, the system locked up (!) when the DFRS went to sleep. > The console displayed: > > ncr0: SCSI phase error fixup: CCB already dequeued (...) Yes, this is the result of the NCR being locked by the unresponsive drive for too long. I should make the recovery more robust, but I do not have a drive that exhibits such a problem, and it is not so easy to reproduce and test, on my system. (I do not want to loose any of my work because of such a test, and do not want to buy a DFRS just to have a drive to play with ...) > and the scrollback buffer worked, and I could enter my account name at > the login prompt. But nothing else; as if any activity requiring the > disk just hung. I hit the reset switch. This is not an acceptable outcome. But I can't do much about it, currently. Sorry ... > I still need to try changing the xs->timeout in the 2.2-BETA kernel and > forcing the tape drives to be async in /etc/rc.local. You may try with all devices async, since one of the inconsistencies that might have been caused by the controller reset are in the transfer rates assumed by both sides. Regards, STefan