FreeBSD Mail Archives

Date:      27 Jul 1999 22:01:48 +0200
From:      Thierry.Besancon@lps.ens.fr
To:        se@FreeBSD.ORG
Cc:        Thierry.Besancon@lps.ens.fr, scsi@FreeBSD.ORG
Subject:   Re: tagged openings
Message-ID:  <wnnzp0hkgk3.fsf@excalibur.lps.ens.fr>
In-Reply-To: Stefan Esser's message of Tue, 27 Jul 1999 20:17:07 %2B0200
References:  <wnnn1woobqg.fsf@excalibur.lps.ens.fr> <19990727201707.A371@dialup124.zpr.uni-koeln.de>


        Hello

A big thanks for your explanation. I was going desperate not to have
any answer... 

Given this host is the fileserver for my all lab, I had to find a
solution. So I went quite the way you told.

I swapped the old host -- a Pentium 90 based PC with only 4 PCI slots
(ethernet, video and 2 scsi cards consumed all the slots) -- for a new
one (K6-II 333 MHz) with more PCI slots (5 slots + AGP video). 

I still couldn't get a NCR810 working with the two tekram. They
complain about EEPROM checksum error when the NCR810 is in. There 
are some notes about that in the notice but their way-out gave
nothing. Sh*t. So I went to an old adaptec 2940.

I changed the system disk too. I went for a spare UW 2 Go I had for my
DEC systems and dropped my old narrow seagate 1 Go ;-)

Since this operation -- and long hours at night testing/switching
hardware -- everything is ok. My fileserver now runs FreeBSD 3.2, with
two tekrams UW and one adaptec 2940. There are just UW disks on the
tekrams, all UW disks are in the UW closets, cable length is restricted
to the minimum I could go, the DLT is alone on the narrow 2940.

The NFS fileserver has been up since 3 days with no freeze and no scsi
error. 

Best regards from Paris at night.

        Thierry






Dixit Stefan Esser <se@zpr.uni-koeln.de> (le Tue, 27 Jul 1999 20:17:07 +0200) :

>> 
>> On 1999-07-22 19:06 +0200, Thierry.Besancon@lps.ens.fr wrote:
>> > I'm running FreeBSD 3.1 and whenever my workstation reboots I get the
>> > message :
>> > 
>> >         (da0:ncr0:0:0:0): tagged openings now 15
>> > 
>> > What does it mean ?
>> 
>> There were more tagged commands issued to the drive than it is able
>> to queue. This is not really a problem, the driver will resend the
>> command that has not been accepted and will reduce the number of 
>> commands in progress on that drive to one less than the number that
>> caused the failure ...
>> 
>> There are quite a number of Quantum drives that had to be entered into
>> the "quirks" table in /sys/cam/cam_xpt.c. You may want to add an entry
>> for your drive, that limits "mintags" to 8 and "maxtags" to 15. See the
>> other Quantum entries for reference.
>> 
>> > I must say that I encounter scsi problems with this host but I can't
>> > find where they're coming from.
>> > Generally the machine freezes with messages saying ncr1 is on timeout.
>> > 
>> > For example :
>> > 
>> >         ncr1:5: ERROR (0:91) (9-ae-800) (8/13) @ (script 6dc:190001cb).
>> 
>> This message indicates a SCSI bus problem. SIST code 0x91 has the 
>> following error bits set:
>> 
>> 0x80 = phase mismatch
>> 0x10 = reselected by another device
>> 0x01 = parity error
>> 
>> This happened during a read (DATA IN phase) after quite some data had
>> already been transfered. While it is bit OK that the driver does not
>> recover from this situation, you may want to check your SCSI cables
>> and terminators to prevent the parity error, which most often is the
>> result of too long a SCSI bus or a bad cable.
>> 
>> > Another one :
>> > 
>> >         ncr1:4:ERROR (81:0) (f-aa-0) (0/3) @ (script 3f0: 48000000)
>> 
>> This one is different, but may well also be caused by spurious SCSI
>> bus pulses. The NCR chip reports an illegal instruction error, which
>> most often is caused by too optimistic PCI performance options choosen
>> in the BIOS setup. Some chip-sets could not really support as many 
>> active bus-masters as claimed (often only a single bus-master was
>> allowed, and the ISA legacy DMA counted as one). Intel chip-sets
>> should be OK, but I'm not sure whether Ali or VIA Super-7 chip-sets
>> are as reliable.
>> 
>> > The precise configuration is 2 Tekram 390F cards + 2 towers of disks
>> > (4 disks each, IBM 9.1 Go), one DLT and one QUANTUM for the system :
>> > 
>> > ncr0: <ncr 53c875 fast20 wide scsi> rev 0x26 int a irq 9 on pci0.9.0
>> > ncr1: <ncr 53c875 fast20 wide scsi> rev 0x26 int a irq 12 on pci0.10.0
>> 
>> Do you know about save cable length limits for ULTRA-SCSI ?
>> Most of your devices are operating at 20MHz synch. SCSI rate, which
>> means the maximum specified SCSI bus length is *at most* 3m.
>> This value does of course include the internal ribbon cable in your
>> drive boxes, which often is already 90cm in a 2 drive enclosing.
>> 
>> If you are not sure that your SCSI bus cable is specified for 20MHz
>> transfer rates, you better consider 1.5m to be the maximum total bus 
>> length, or you will see sporadic transfer failures (with a certain 
>> probability of undetected data corruption, since parity only detects 
>> single bit errors (or rather odd numbers of flipped bits)).
>> 
>> > The DLT is daisy chained with one UW tower and I don't use the narrow
>> > connector on the tekram 390F. If I do so, the workstation just freezes
>> > during the boot with an error like :
>> > 
>> >         ncr1:5: ERROR (0:91) (9-ae-800) (8/13) @ (script 6dc:190001cb).
>> 
>> This is again the same parity error as in the first message and it
>> points to the real source of your trouble: SCSI bus data corruption.
>> 
>> > I must say too that I had the same problems with the same PC in
>> > another configuration : the DLT was the same, the system disk was the
>> > same, all other disks were different and not UW, the scsi cards were
>> > NCR 810.
>> 
>> Since the 810 only supported 10MHz rates, you could have a 6m SCSI bus, 
>> but only if termination at both sides was OK. There have been a few 
>> cheap 810 based SCSI cards with only passive terminators (single in-line 
>> resistor packs), though the original NCR and all Symbios cards (as well 
>> as Tekram and other high quality cards) always used active terminators, 
>> AFAIK.
>> 
>> Again: If your cable quality is not up to the spec, you better stay below 
>> half the maximum specified for perfect cables and terminators.
>> 
>> > The scsi bus goinf timeout is always the one with the DLT.
>> > Might it be faulty ?
>> 
>> No, I just think that you violate the Fast-20 specs by daisy chaining
>> the DLT with the UW tower, which may already be at its limits because
>> of the sum of external and internal cables between the SCSI card and
>> the last disk drive in the chain. (External cables are often in the
>> order of 1.5m and I guess that the internal cable will be at least 0.9m 
>> long ...)
>> 
>> Isn't the DLT4000 a non-wide SCSI device ?
>> 
>> If you connect an 8bit SCSI cable to the end of the 16bit SCSI bus,
>> you need quite some extra safety margin (i.e. restrict the total cable 
>> length even further).
>> 
>> 
>> If you really need to connect that number of drives, you may be 
>> better off with an Ultra-2 (Fast-40) card with the bus operating in
>> LVD mode. Your IBM DDRS drives may already be U2W, I can't tell from
>> the probe message, at least over here they are the same price in either
>> UW or U2W versions. LVD supports a SCSI bus length of 12.5m, which should 
>> satisfy your requirements ;-)
>> 
>> But if you have another free PCI slot, you may instead just install
>> another 8bit SCSI card (Sym8100 or 8600) for the DLT (and the boot disk).
>> 
>> Regards, STefan
>> 


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-scsi" in the body of the message

Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?wnnzp0hkgk3.fsf>

Header And Logo

Peripheral Links

Site Navigation

Header And Logo

Peripheral Links

Search

Site Navigation