Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 22 Jan 1996 12:41:37 +0100
From:      Helmut Wirth <wirth@zerberus.hai.siemens.co.at>
To:        freebsd-bugs@freebsd.org
Cc:        wirth@zerberus.hai.siemens.co.at
Subject:   Bug with NCR810 driver: Corrections, Additions and a Solution, see previous message 
Message-ID:  <9601221141.AA09490@zerberus.hai.siemens.co.at>
In-Reply-To: Your message of "Fri, 19 Jan 1996 11:24:28 %2B0100." <9601191024.AA01493@zerberus.hai.siemens.co.at> 

next in thread | previous in thread | raw e-mail | index | archive | help
Hello!

Friday I suspected a bug in the NCR-Driver. On the weekend I had a closer
look into my problem:

The problem was that dumping from 2 of my 3 SCSI disks seemed to trigger bugs
in the NCR driver; there was an 'assertion failed error' inside the NCR driver.

With some hints from Stefan Esser I was able to track down the real problem and 
it lies NOT with the NCR driver!

The problem (please see my previous bug report too) was: Using 'dump' to 
dump one of the disks produced an error 'Device not ready' and aborted the dump.
On the other disk the dump seemed to work, but in the process i got the 'assertion
failed' message I mentioned above.

Both of the disks are IBM SCSI-2 disks (1GB each). The Quantum ATLAS does not
have the problem.

SCSI traces from (SCSIDEBUG defined) the kernel showed the problem within the
SCSI subsystem. The 'dump' utility at one point early in the execution forks and
then the disk to dump is opened some times (sd_open in sd.c) in quick succession.
At this point both bugs showed. I could not trace down the 'assertion failed' message,
because the event did not happen with the trace flags for SCSI or for NCR enabled.
The 'Unit not ready' was traceable. 

Everytime when the sd_open routine was called in quick succession I heard a 
head recalibration inside one of the drives. After some thinking I suspected the
START_UNIT command at the beginning of the sd_open routine. So I commented it out
and *both* errors went away !! Without this command both disk work very well and 
there are no more unusual head recalibration cycles.

As I reported in my last post, the bug went away too, when I disabled the tags
of the offending drive.

If somebody is interested, I saved the trace files and will submit them if they 
are needed.

I suspect, that the IBM drives don't like two or three START_UNIT commands given
with tags and very fast. The drive was *really* not ready at the time of the error.

The entire thing is *not* exactly a bug, but I think it is a weakness in the
SCSI code. I think we should change two things:

1) The traces from the SCSI driver show (with the log time entries) that the retries
   it does are by far to fast for the unit to become ready again. The unit needs about
   1 to 1.5 seconds to do its recalibration but the driver is through with its
   retries in under 1 second. Perhaps the retries should be slower for some things
   as for example a failed TEST_UNIT_READY.
2) The START_UNIT command is not neccessary for most of the disks (at least not with
   each open, but only the first). It should not be given deliberately often or there
   should be a minimal time (some seconds) between this commands. As this is a
   more complicated change, I think the START_UNIT command with every open should be
   an option and not the default.

What do you think ?

Regards
Helmut Wirth



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?9601221141.AA09490>