From owner-freebsd-bugs Mon Jan 22 03:43:32 1996 Return-Path: owner-bugs Received: (from root@localhost) by freefall.freebsd.org (8.7.3/8.7.3) id DAA04000 for bugs-outgoing; Mon, 22 Jan 1996 03:43:32 -0800 (PST) Received: from proxy.siemens.at (proxy.siemens.at [192.138.228.19]) by freefall.freebsd.org (8.7.3/8.7.3) with SMTP id DAA03984 for ; Mon, 22 Jan 1996 03:42:48 -0800 (PST) Received: from zerberus.hai.siemens.co.at (zerberus.hai.siemens-austria) by proxy.siemens.at with SMTP id AA18086 (5.67a/IDA-1.5 for ); Mon, 22 Jan 1996 12:41:40 +0100 Received: from localhost by zerberus.hai.siemens.co.at (4.1/SMI-4.1) id AA09490; Mon, 22 Jan 96 12:41:38 +0100 Message-Id: <9601221141.AA09490@zerberus.hai.siemens.co.at> To: freebsd-bugs@freebsd.org Cc: wirth@zerberus.hai.siemens.co.at Subject: Bug with NCR810 driver: Corrections, Additions and a Solution, see previous message In-Reply-To: Your message of "Fri, 19 Jan 1996 11:24:28 +0100." <9601191024.AA01493@zerberus.hai.siemens.co.at> Date: Mon, 22 Jan 1996 12:41:37 +0100 From: Helmut Wirth Sender: owner-bugs@freebsd.org Precedence: bulk Hello! Friday I suspected a bug in the NCR-Driver. On the weekend I had a closer look into my problem: The problem was that dumping from 2 of my 3 SCSI disks seemed to trigger bugs in the NCR driver; there was an 'assertion failed error' inside the NCR driver. With some hints from Stefan Esser I was able to track down the real problem and it lies NOT with the NCR driver! The problem (please see my previous bug report too) was: Using 'dump' to dump one of the disks produced an error 'Device not ready' and aborted the dump. On the other disk the dump seemed to work, but in the process i got the 'assertion failed' message I mentioned above. Both of the disks are IBM SCSI-2 disks (1GB each). The Quantum ATLAS does not have the problem. SCSI traces from (SCSIDEBUG defined) the kernel showed the problem within the SCSI subsystem. The 'dump' utility at one point early in the execution forks and then the disk to dump is opened some times (sd_open in sd.c) in quick succession. At this point both bugs showed. I could not trace down the 'assertion failed' message, because the event did not happen with the trace flags for SCSI or for NCR enabled. The 'Unit not ready' was traceable. Everytime when the sd_open routine was called in quick succession I heard a head recalibration inside one of the drives. After some thinking I suspected the START_UNIT command at the beginning of the sd_open routine. So I commented it out and *both* errors went away !! Without this command both disk work very well and there are no more unusual head recalibration cycles. As I reported in my last post, the bug went away too, when I disabled the tags of the offending drive. If somebody is interested, I saved the trace files and will submit them if they are needed. I suspect, that the IBM drives don't like two or three START_UNIT commands given with tags and very fast. The drive was *really* not ready at the time of the error. The entire thing is *not* exactly a bug, but I think it is a weakness in the SCSI code. I think we should change two things: 1) The traces from the SCSI driver show (with the log time entries) that the retries it does are by far to fast for the unit to become ready again. The unit needs about 1 to 1.5 seconds to do its recalibration but the driver is through with its retries in under 1 second. Perhaps the retries should be slower for some things as for example a failed TEST_UNIT_READY. 2) The START_UNIT command is not neccessary for most of the disks (at least not with each open, but only the first). It should not be given deliberately often or there should be a minimal time (some seconds) between this commands. As this is a more complicated change, I think the START_UNIT command with every open should be an option and not the default. What do you think ? Regards Helmut Wirth