From owner-freebsd-scsi Mon Oct 13 12:40:37 1997 Return-Path: Received: (from root@localhost) by hub.freebsd.org (8.8.7/8.8.7) id MAA17941 for freebsd-scsi-outgoing; Mon, 13 Oct 1997 12:40:37 -0700 (PDT) (envelope-from owner-freebsd-scsi) Received: from mail.kcwc.com (h1.kcwc.com [206.139.252.2]) by hub.freebsd.org (8.8.7/8.8.7) with SMTP id MAA17932 for ; Mon, 13 Oct 1997 12:40:28 -0700 (PDT) (envelope-from curt@kcwc.com) Received: by mail.kcwc.com (NX5.67c/NeXT-2.0-KCWC-1.0) id AA12601; Mon, 13 Oct 97 15:40:09 -0400 Date: Mon, 13 Oct 97 15:40:09 -0400 From: curt@kcwc.com (Curt Welch) Message-Id: <9710131940.AA12601@mail.kcwc.com> Received: by NeXT.Mailer (1.87.1) Received: by NeXT Mailer (1.87.1) To: Jaye Mathisen Subject: Re: Still having some amusing DPT problems. Cc: scsi@FreeBSD.ORG Sender: owner-freebsd-scsi@FreeBSD.ORG X-Loop: FreeBSD.org Precedence: bulk > Invariably, after 20-30 hours of getting hammered, the > box dies with "DPT: Undocumented error", and drops into > DDB. If I take out DDB, I get a kazillion messages about > stale transactions that aren't really dead, and a bunch > of other errors. I had similar symptoms. I'm running the DPT on a busy usenet news server. It would never run for more than a day or two without crashing. Messages about stale transactions were one of the problems I saw. I never used DDB. I've never seen the "Undocumented error" diagnostic. However, once I upgraded to version 1.2.3 of the DPT driver, my problems went away (2 months of no problems). But what happened with that release is that the disk performance improved enough to allow my news server to keep up with the news for the first time. This means that every 5 minutes or so, the incoming feeds completes and the disks get to "catch up" - i.e. flush their cache. Before version 1.2.3 the news server could never keep up and was therefor busy non-stop 24 hours a day. I've always wondered if the new version of the driver really fixed all the previous problems I had seen or if it had just gotten around them by improving the performance. Your report makes me wonder. What DPT options do you have set? (and what version of the driver are you useing?). Are you using the DPT_HANDLE_TIMEOUTS? If you are getting messages about stale transactions I think you must have it on. Try turing it off. As I understand it, this makes the DPT driver check for transactions that take too long to complete. But "Too long" is just a calculated value that could well be too short for a very heavilly loaded system. This was one of the problems I was having. The DPT driver was aborting transactions that took too long (>20 seconds or so), when it shouldn't have. If it would have waited a bit long, the transaction would finish fine on their own. These long waits are side-effect of a large cache on the controller. I had the full 64Meg cahce on mine. If you are having real problems with lost transactions, then you might have to leave the option on, and instead try changing how it calculates the timeout values. The only options I'm using are: DPT_MEASURE_PERFORMANCE DPT_TIMEOUT_FACTOR=4 Simon of course can correct anything wrong I might have said above.... -- Curt Welch http://CurtWelch.Com/ (Just trying to take some of the load off of Simon after all the time he spent helping get my system stable...)