Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 09 Jul 1997 20:14:12 -0700 (PDT)
From:      Simon Shapiro <Shimon@i-Connect.Net>
To:        Josh Tiefenbach <josh@ican.net>
Cc:        scsi@freebsd.org
Subject:   RE: Yet Another DPT Update
Message-ID:  <XFMail.970709201412.Shimon@i-Connect.Net>
In-Reply-To: <19970709220749.25037@ican.net>

next in thread | previous in thread | raw e-mail | index | archive | help

Hi Josh Tiefenbach;  On 10-Jul-97 you wrote: 
> More with the updates. We've stuck the DPT back into the production box
> (Compaq PPro 200).
> 
> dmesg:
> 
> DPT:  PCI SCSI HBA Driver, version 1.1.6
> dpt0 <DPT Caching SCSI RAID Controller> rev 2 int a irq 11 on pci0:18
> dpt0: DPT type 3, model PM3334UW firmware 07L0, Protocol 0 on port 1410
> dpt0: Options: USE_SINTR, TRACK_CCB_STATES, MEASURE_PERFORMANCE,
> HANDLE_TIMEOUTS, SINTR_SPLHIGH
> dpt0 waiting for scsi devices to settle
> (dpt0:0:0): "Quantum XP32150W L915" type 0 fixed SCSI 2
> sd0(dpt0:0:0): Direct-Access 2050MB (4199759 512 byte sectors)
> (dpt0:1:0): "Quantum XP32150W L915" type 0 fixed SCSI 2
> sd1(dpt0:1:0): Direct-Access 2050MB (4199759 512 byte sectors)
> (dpt0:2:0): "DPT RAID-5 07L0" type 0 fixed SCSI 2
> sd2(dpt0:2:0): Direct-Access 8201MB (16796928 512 byte sectors)
> 
> The following happened during a newfs of the RAID drive:
> 
> dpt0: BAD (0) CCB in SP (status = 0000 0000 ).

This is clearly what we see here on certain systems.  In this case BOTH 
the status register and the CCB are bogus.  This is not the data we expect,
not can the DPT generate these.  The PCI bus or some hardware along the
line is eating it.

> dpt0: Marking 27627 (Write (10) [6.1.18]) on c0b0t2l0 as late after
> 10042353usec

Since we threw away the corrupt CCB (not knowing which one it is), the
real command simply times out.

> dpt0: Destroying stale 27627 (Write (10) [6.1.18]) on c0b0t2l0 (20042335)

Now we lost patience with this I/O request.  We are going to do it in.

> dpt0: Request 99041 recieved with clear EOC.  Marking as LOST.

This one is probably noise on the bus.  If this bit is off, it means no 
command completed.  We treat it as a loss, since we know (hope) what the
command was but have no confidence in its integrity.

...  more of the same ...

> And this while running diablo ( a news feeder program, *not* the game :)

... and yet more ...

> Again. I should point out that the above errors *did not happen* when
> using
> the card, v1.1.6 of the driver w/same options, in a Pentium-100 box.

Sort of proves the point...  :-(

> Shimon: a) Any other data that you need? b) any ETA on v1.1.7 of the
> driver?

I forwarded your message to my DPT contact.  The certification people 
there want specific hardware setups.  I think the FreeBSD driver may be 
a bit faster than usual and that is why this problem is not so visible
on other platforms.  We can always put some delays in dpt_intr() and see
if things improve.  You can add ``DELAY(xx);'' somewhere at the very top,
and see if it makes any difference.

Let me know if that helps.  Version 1.1.7 is a merge of Justin's code
review.  It makes the code cleaner, somewhat leaner and (hopefully) much
more acceptable.  I also reversed toe (reversed) priorities for the SCSI
software interrupts, putting them in line with bio, rahter than net.

I will release 1.1.7 either tonight or tomorrow.

Simon



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?XFMail.970709201412.Shimon>