From owner-freebsd-hardware  Fri Aug 23 14:23:02 1996
Return-Path: owner-hardware
Received: (from root@localhost)
          by freefall.freebsd.org (8.7.5/8.7.3) id OAA14690
          for hardware-outgoing; Fri, 23 Aug 1996 14:23:02 -0700 (PDT)
Received: from FileServ1.MI.Uni-Koeln.DE (FileServ1.MI.Uni-Koeln.DE [134.95.212.1])
          by freefall.freebsd.org (8.7.5/8.7.3) with SMTP id OAA14679
          for <freebsd-hardware@FreeBSD.ORG>; Fri, 23 Aug 1996 14:22:54 -0700 (PDT)
Received: from x14.mi.uni-koeln.de (annexr3-6.slip.Uni-Koeln.DE) by FileServ1.MI.Uni-Koeln.DE with SMTP id AA08226
  (5.67b/IDA-1.5 for <freebsd-hardware@FreeBSD.ORG>); Fri, 23 Aug 1996 23:22:26 +0200
Received: (from se@localhost) by x14.mi.uni-koeln.de (8.7.5/8.6.9) id WAA22802; Fri, 23 Aug 1996 22:07:17 +0200 (MET DST)
Date: Fri, 23 Aug 1996 22:07:17 +0200 (MET DST)
Message-Id: <199608232007.WAA22802@x14.mi.uni-koeln.de>
From: Stefan Esser <se@zpr.uni-koeln.de>
To: Peter Childs <pjchilds@imforei.apana.org.au>
Cc: se@zpr.uni-koeln.de (Stefan Esser), msmith@atrad.adelaide.edu.au,
        freebsd-hardware@FreeBSD.ORG
Subject: Re: ASUS SC200 SCSI card?
In-Reply-To: <199608221728.CAA00592@al.imforei.apana.org.au>
References: <199608212019.WAA07179@x14.mi.uni-koeln.de>
	<199608221728.CAA00592@al.imforei.apana.org.au>
Sender: owner-hardware@FreeBSD.ORG
X-Loop: FreeBSD.org
Precedence: bulk

Peter Childs writes:
 > 
 > [ Discussion about hangs on 2.1.5-stable machine with dual ASUS SC200
 >   NCR810 PCI scsi controllers follows... may be dangerous to
 >   mental health ]
 > 
 > Stefan Esser wrote...
 > >  >  So this means that with multiple SC200 cards they can all be set on
 > >  >  INTA ??   If so are there any pros/cons to doing this?
 > > 
 > > In fact they all SHOULD be set to Int A !
 > 
 >  Ok.. the second NCR was set to INT B... i've put them both on A
 >  now.

Ok. The BIOS has setup a connection from Int A of the PCI
slot you put the card in to some interrupt line of the ISA
interrupt controller. It then put the IRQ choosen into some
NCR register (in PCI config space) for the driver init code.

The driver used that IRQ number to install an interrupt
handler. Now, if you had the card configured to Int B, then
the BIOS would still (correctly !) setup Int A, since the 
NCR chip got its request for the Int A hard coded into some
other config space register.

Now the NCR had got a jumper to the Int B line in your previous
setup, while the BIOS and the driver assumed to have it wired
to Int A, and Int B interrupts would either be ignored or 
delivered as some other IRQ, depending on implementation details
of your mother board.

This might have lead to strange effects (stray interrupts, for
example), but if Int B came out as some IRQ for which some other
devcie had registered a handler, then that device would receive
"spurious" interrupts, and most probably ignore them ...
But the second NCR was effectively limited to a low command rate
(It will be polled 100 times a second until the driver sees the
first interrupt occur).

 > > Please send me some details (from /var/log/messages). I need
 > > at least the complete boot message log (preferably from a 
 > > boot with "-v" for more verbose probe output) and the error
 > > message when the SCSI command was aborted.
 > 
 >  Nothing gets into /var/log/message when it dies... I've taken the following
 >  action (one crash after the next)
 > 
 >  1. INT B -> INT A on the second card.
 >  2. PCI latency was set to 80... Michael Smith suggested it be lower than
 >     32 to i've moved it to 20.

PCI latency is a very nice concept, but like interrupts quite
different from what you'd expect in a PC compaticle system ...

The latency timer has to be set to a value that permits long 
bursts of data to be sent, taking advantage of page mode and 
cache snoop optimizations (one snoop per cache line instead 
of per memory access).

But these burst ought to be limited in such a way, that no 
device's input buffer overflows because a burst takes too long.
While ISA bus-master devices (for example the Adpatec 1542) did
short bursts (4 WORD transfers, IIRC) and then released the 
ISA bus for a few microseconds, PCI has a concept of an arbiter,
which assigns the bus to any bus-master in the system, generally
in a round-robin fashion.

If a device FIFO is 512 bytes and data arrives at 10MB/s (say
a 100baseT Ethernet chip), then it can give up the PCI bus for
some 50 microseconds.

At a burst transfer rate of 80MB/s it would take less than 10
microseconds to write the FIFO contents to memory, while the 
same chip might only be able to get 20MB/s using small (4 DWORD)
transfers.

You want to guarantee, that each device gets the PCI bus granted
before its buffer overflows. And the easiest way to achieve this
is to have a timer set to the maximum latency allowed divided by
the number of devices on the PCI bus. If a device starts a burst,
it is allowed to proceed, even if some other device requests the
PCI bus. But if there is a request from some other device and the
latency timer is expired, the first device is asked to stop its 
burst ASAP, and the next device will become the bus owner.

This way a burst can be extended arbitrarily if there is no other
request for the bus, but there is a guarantee that after #devices
times max_latency each device had access to the PCI bus.

PCI defines registers to contain information about the required
burst length and the maximum latency, and a PCI BIOS might be 
able to calculate the optimum value of the latency timer from
these parameters of all PCI devices installed ...

(But I don't know of any PCI BIOS that actually does this.)

 >  3. Grabbed a fresh 2.1.5-stable kernel (i follow -stable, but my kernel
 >     tree had ipfilter stuff in it...)
 >  4. Removed the
 >       options        OD_BOGUS_NOT_READY
 >     line from my config.
 >  
 >  It feels fine when i'm not accessing the MO drive.. but i did a 
 >  "make clean" on my -stable tree... which is on the second scsi bus,
 >  did a "bad144 -s /dev/rod0" on the MO disk (also second scsi bus),
 >  and started thrashing tin.. (newsspools on the first scsi - old
 >  disk)...   this all ran fine for a good 10 minutes... then suddenly..
 >  bang..   locked solid.

Hmmm, it proceeds for 10 minutes, then hangs without any error
messages ?

 >  I'll include the "-v" boot here.. and hope i don't annoy to many 
 >  people with its size :)

Well, they don't have to read beyond this point :)

 >  I think when i get back after the weekend i'll pull one of the SCSI
 >  controllers out, and see how i go thrashing all the devices.  Its a 
 >  real pain not being able to depend on this machine, esp. the MO
 >  disk (it always crashes before i can get a backup finished :)

Yes, I really understand that ...

 > FreeBSD 2.1.5-STABLE #0: Fri Aug 23 11:56:26 CST 1996
 >     root@:/disk2/kernel/sys/compile/AL_1.8
 > CPU: i486DX (486-class CPU)
 >   Origin = "AuthenticAMD"  Id = 0x494
 > real memory  = 67108864 (65536K bytes)
 > avail memory = 64106496 (62604K bytes)
 > pcibus_setup(1):        mode1res=0x80000000 (0x80000000), mode2res=0xff (0x0e)
 > pcibus_setup(2):        mode1res=0x80000000 (0x80000000)
 > pcibus_check:   device 0 1 2 3 4 5 is there (id=04961039)
 > Probing for devices on PCI bus 0:
 >         configuration mode 1 allows 32 devices.
 > chip0 <SiS 85c496> rev 49 on pci0:5

Hmm, a SiS chip set ...
Did you try to disable PCI performance options like burst
mode or write buffers ?

There are some PCI chip sets that don't work reliably with
competing bus-masters and those options enabled.

 > ncr0 <ncr 53c810 scsi> rev 17 int a irq 15 on pci0:11

Is this a NCR 53c810A ?
I don't have a data book about that particular chip, but
according to a numbering convention used for other NCR 
chips, the A devices get a rev. > 0x10.

 >         mapreg[10] type=1 addr=0000e800 size=0100.
 >         mapreg[14] type=0 addr=fbff0000 size=0100.
 >         reg20: virtual=0xf546f000 physical=0xfbff0000 size=0x100
 > ncr0: restart (scsi reset).
 > ncr0 scanning for targets 0..6 (V2 pl23 95/09/07)
 > Choosing drivers for scbus configured at 0
 > (ncr0:1:0): "QUANTUM FIREBALL1080S 1Q09" type 0 fixed SCSI 2
 > sd is configured at 0
 > sd0(ncr0:1:0): Direct-Access 
 > sd0(ncr0:1:0): FAST SCSI-2 100ns (10 Mb/sec) offset 8.
 > 1042MB (2134305 512 byte sectors)
 > sd0(ncr0:1:0): with 3835 cyls, 4 heads, and an average 139 sectors/track
 > (ncr0:5:0): "MICROP 1684-07MB1057403 HSP4" type 0 fixed SCSI 1
 > sd is configured at 3
 > sd3(ncr0:5:0): Direct-Access 323MB (663476 512 byte sectors)
 > sd3(ncr0:5:0): with 1780 cyls, 7 heads, and an average 53 sectors/track
 > ncr1 <ncr 53c810 scsi> rev 1 int a irq 14 on pci0:12

This one is the same revision as the chip I got.

 >         mapreg[10] type=1 addr=0000e400 size=0100.
 >         mapreg[14] type=0 addr=fbfe0000 size=0100.
 >         reg20: virtual=0xf5472000 physical=0xfbfe0000 size=0x100
 > ncr1: restart (scsi reset).
 > ncr1 scanning for targets 0..6 (V2 pl23 95/09/07)
 > (ncr1:3:0): "SEAGATE ST51080N 0943" type 0 fixed SCSI 2
 > sd is configured at 4
 > sd4(ncr1:3:0): Direct-Access 
 > sd4(ncr1:3:0): FAST SCSI-2 100ns (10 Mb/sec) offset 8.
 > 1030MB (2109840 512 byte sectors)
 > sd4(ncr1:3:0): with 4826 cyls, 4 heads, and an average 109 sectors/track
 > (ncr1:6:0): "FUJITSU M2512A 1507" type 7 removable SCSI 2
 > od is configured at 0
 > od0(ncr1:6:0): Optical 
 > od0(ncr1:6:0): 200ns (5 Mb/sec) offset 8.
 > 217MB (446325 512 byte sectors)
 > od0(ncr1:6:0): with approximate 217 cyls, 64 heads, and 32 sectors/track
 > pci0: uses 512 bytes of memory from fbfe0000 upto fbff00ff.
 > pci0: uses 512 bytes of I/O space from e400 upto e8ff.

[ probe of ISA devices removed ]

 >  Relevant(??) bits of my kernel config as follows...
 > 
 > controller      pci0
 > controller      ncr0
 > 
 > controller scbus0 at ncr0
 > controller scbus1
 > 
 > disk    sd0     at scbus0 target 1
 > disk    sd4     at scbus1 target 3
 > disk    sd3     at scbus0 target 5
 > device  od0     at scbus1 target 6
 > 
 > #options        OD_BOGUS_NOT_READY

Doesn't look wrong at all ...

So please try:

- with all devices connected to one NCR

- with PCI performance options disabled

- with the SCSI transfer rate reduced to 2MHz (-> ncrcontrol -s sync=2)

I guess you know about the limitation on the length of the 
SCSI cable, the requirements for correct termination and