Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 5 Oct 1997 02:10:01 -0700 (PDT)
From:      Stefan Esser <se@FreeBSD.ORG>
To:        freebsd-bugs
Subject:   Re: kern/4684: crash on very heavy disk activity.
Message-ID:  <199710050910.CAA08324@hub.freebsd.org>

next in thread | raw e-mail | index | archive | help
The following reply was made to PR kern/4684; it has been noted by GNATS.

From: Stefan Esser <se@FreeBSD.ORG>
To: zach@gaffaneys.com
Cc: FreeBSD-gnats-submit@freebsd.org, Stefan Esser <se@freebsd.org>
Subject: Re: kern/4684: crash on very heavy disk activity.
Date: Sun, 5 Oct 1997 11:04:22 +0200

 On 1997-10-03 17:04 -0500, Zach Heilig <zach@gaffaneys.com> wrote:
 > >Synopsis:       crash on very heavy disk activity.
 
 > FreeBSD 2.2-STABLE #0: Thu Sep 18 09:44:26 CDT 1997
 > ncr0 <ncr 53c875 fast20 wide scsi> rev 1 int a irq 11 on pci0:10
 > (ncr0:0:0): "MICROP 4743NS S162" type 0 fixed SCSI 2
 > sd0(ncr0:0:0): Direct-Access 
 > sd0(ncr0:0:0): 20.0 MB/s (50 ns, offset 15)
 > 
 > sd0(ncr0:0:0): M_DISCONNECT received, but datapointer not saved:
 > 	data=701b4 save=e40016b0 goal=e40016d4.
 
 Hmm, the drive disconnected during the probe ...
 Does this happen on each boot ?
 
 > 4100MB (8398656 512 byte sectors)
 > sd0(ncr0:0:0): with 6506 cyls, 7 heads, and an average 184 sectors/track
 > (ncr0:1:0): "QUANTUM FIREBALL_TM2110S 300X" type 0 fixed SCSI 2
 > sd1(ncr0:1:0): Direct-Access 
 > sd1(ncr0:1:0): 20.0 MB/s (50 ns, offset 15)
 > 2014MB (4124736 512 byte sectors)
 > sd1(ncr0:1:0): with 6810 cyls, 4 heads, and an average 151 sectors/track
 > (ncr0:2:0): "iomega jaz 1GB J.83" type 0 removable SCSI 2
 > sd2(ncr0:2:0): Direct-Access 
 > sd2(ncr0:2:0): 10.0 MB/s (100 ns, offset 15)
 > 
 > sd2(ncr0:2:0): ILLEGAL REQUEST asc:24,0 Invalid field in CDB
 > sd2 could not mode sense (4). Using ficticious geometry
 > 1021MB (2091050 512 byte sectors)
 > sd2(ncr0:2:0): with 1021 cyls, 64 heads, and an average 32 sectors/track
 > (ncr0:4:0): "SANYO CRD-254S 1.02" type 5 removable SCSI 2
 > cd0(ncr0:4:0): CD-ROM 
 > cd0(ncr0:4:0): asynchronous.
 
 > Here are the last few console messages before the reboot:
 > 
 > sd2(ncr0:2:0): extraneous data discarded.
 > sd2(ncr0:2:0): COMMAND FAILED (9 0) @f078bc00.
 > sd2(ncr0:2:0): extraneous data discarded.
 > sd2(ncr0:2:0); COMMAND FAILED (9 0) @f06b4000.
 > sd2(ncr0:2:0): extraneous data discarded.
 > sd2(ncr0:2:0); COMMAND FAILED (9 0) @f06b4000.
 > sd2(ncr0:2:0): extraneous data discarded.
 > sd2(ncr0:2:0); COMMAND FAILED (9 0) @f06b4000.
 > /src: bad dir ino 145920 at offset 0: mangled entry
 > panic: bad dir
 > syncing disks... 22 22 21 18 14 9 2 2 2 2 2 2 2 2 2 2 2 2 2 2 giving up
 > dumping to dev 30401, offset 131072
 > dump 64 63 ... 1 succeeded
 > 
 > This was during both an rm -rf of a large tree on sd2s1e and a cvs checkout
 > from the cvs repository I keep on that slice.
 
 The command failed because of lack of agreement on the amount of 
 data requested. The drive stayed in a data phase, when there was 
 either no more data to deliver to it, or no more buffer space to
 store the data read (depending on whether this happened during a
 read or a write).
 
 This (together with the disconnect of your UW drive) indicates 
 there is a SCSI bus problem. SCSI strobe pulses got lost or 
 duplicated.
 
 What's the (total!) length of your SCSI bus ?
 (Internal plus external, number of connectors, if any, terminators ?)
 
 Could you try with a much reduced data rate (say 5MHz), just to
 make sure it is not caused by the bus cable ?
 
 > I have more information, if you need to know more.  I did keep the core dump
 > around (and logged the fsck for /dev/sd2s1e).
 
 I don't expect this to be a software problem. The core won't help
 much in this case, since the crash happened some time after the 
 SCSI problem was detected by the NCR chip and driver.
 
 Please check your cables and terminators.
 
 There appear to be NCR cards with erroneous documentation. They
 got the terminator enable/disable labels exchanged. I do not have
 such a card, just heard about it ...
 
 You also should be aware, that Fast-20 limits the cable length to
 half of what was allowed with Fast-10. In fact, I'd be reluctant
 to use more than 1m of cable between the controller and the last
 device, assuming you got no external devices.
 
 Don't believe specified maximum bus length, unless you know you got 
 a first grade SCSI bus cable (as was discussed recently), since only
 such a cable will guarantee that the electrical parameters of the 
 bus meet the SCSI standard requirements. You may use a cheap cable,
 if you limit the cable length to about half that allowed by the SCSI
 standard for the intended data rate, and if you don't connect to many 
 devices.
 
 Regards, STefan



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199710050910.CAA08324>