From owner-freebsd-scsi  Mon Oct 13 12:40:37 1997
Return-Path: <owner-freebsd-scsi>
Received: (from root@localhost)
          by hub.freebsd.org (8.8.7/8.8.7) id MAA17941
          for freebsd-scsi-outgoing; Mon, 13 Oct 1997 12:40:37 -0700 (PDT)
          (envelope-from owner-freebsd-scsi)
Received: from mail.kcwc.com (h1.kcwc.com [206.139.252.2])
          by hub.freebsd.org (8.8.7/8.8.7) with SMTP id MAA17932
          for <scsi@FreeBSD.ORG>; Mon, 13 Oct 1997 12:40:28 -0700 (PDT)
          (envelope-from curt@kcwc.com)
Received: by mail.kcwc.com (NX5.67c/NeXT-2.0-KCWC-1.0)
	id AA12601; Mon, 13 Oct 97 15:40:09 -0400
Date: Mon, 13 Oct 97 15:40:09 -0400
From: curt@kcwc.com (Curt Welch)
Message-Id: <9710131940.AA12601@mail.kcwc.com>
Received: by NeXT.Mailer (1.87.1)
Received: by NeXT Mailer (1.87.1)
To: Jaye Mathisen <mrcpu@cdsnet.net>
Subject: Re: Still having some amusing DPT problems.
Cc: scsi@FreeBSD.ORG
Sender: owner-freebsd-scsi@FreeBSD.ORG
X-Loop: FreeBSD.org
Precedence: bulk

>  Invariably, after 20-30 hours of getting hammered, the
>  box dies with "DPT: Undocumented error", and drops into
>  DDB.  If I take out DDB, I get a kazillion messages about
>  stale transactions that aren't really dead, and a bunch
>  of other errors.

I had similar symptoms.  I'm running the DPT on a busy
usenet news server.  It would never run for more than a day
or two without crashing.  Messages about stale transactions
were one of the problems I saw.  I never used DDB.  I've
never seen the "Undocumented error" diagnostic.

However, once I upgraded to version 1.2.3 of the DPT driver,
my problems went away (2 months of no problems).  But what
happened with that release is that the disk performance
improved enough to allow my news server to keep up with the
news for the first time.  This means that every 5 minutes
or so, the incoming feeds completes and the disks get to
"catch up" - i.e. flush their cache.  Before version 1.2.3
the news server could never keep up and was therefor busy
non-stop 24 hours a day.

I've always wondered if the new version of the driver really fixed
all the previous problems I had seen or if it had just gotten around
them by improving the performance.  Your report makes me wonder.

What DPT options do you have set? (and what version of the
driver are you useing?).

Are you using the DPT_HANDLE_TIMEOUTS?  If you are getting messages
about stale transactions I think you must have it on.  Try turing
it off.  As I understand it, this makes the DPT driver check
for transactions that take too long to complete.  But "Too long"
is just a calculated value that could well be too short for a very
heavilly loaded system.  This was one of the problems I was
having.  The DPT driver was aborting transactions that took too
long (>20 seconds or so), when it shouldn't have.  If it would have
waited a bit long, the transaction would finish fine on their own.
These long waits are side-effect of a large cache on the controller.
I had the full 64Meg cahce on mine.

If you are having real problems with lost transactions, then you
might have to leave the option on, and instead try changing
how it calculates the timeout values.

The only options I'm using are:

   DPT_MEASURE_PERFORMANCE
   DPT_TIMEOUT_FACTOR=4
   

Simon of course can correct anything wrong I might have
said above....

-- 

Curt Welch
http://CurtWelch.Com/

(Just trying to take some of the load off of Simon after all
the time he spent helping get my system stable...)