From owner-freebsd-scsi Mon Oct 13 15:01:06 1997 Return-Path: Received: (from root@localhost) by hub.freebsd.org (8.8.7/8.8.7) id PAA27506 for freebsd-scsi-outgoing; Mon, 13 Oct 1997 15:01:06 -0700 (PDT) (envelope-from owner-freebsd-scsi) Received: from sendero-ppp.i-connect.net (sendero-ppp.i-Connect.Net [206.190.143.100]) by hub.freebsd.org (8.8.7/8.8.7) with SMTP id PAA27497 for ; Mon, 13 Oct 1997 15:00:54 -0700 (PDT) (envelope-from shimon@sendero-ppp.i-connect.net) Received: (qmail 16412 invoked by uid 1000); 13 Oct 1997 22:01:04 -0000 Message-ID: X-Mailer: XFMail 1.2-beta-100797 [p0] on FreeBSD X-Priority: 3 (Normal) Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 8bit MIME-Version: 1.0 In-Reply-To: <9710131940.AA12601@mail.kcwc.com> Date: Mon, 13 Oct 1997 15:01:04 -0700 (PDT) Organization: Atlas Telecom From: Simon Shapiro To: (Curt Welch) Subject: Re: Still having some amusing DPT problems. Cc: scsi@FreeBSD.ORG, Jaye Mathisen Sender: owner-freebsd-scsi@FreeBSD.ORG X-Loop: FreeBSD.org Precedence: bulk Hi Curt Welch; On 13-Oct-97 you wrote: > > Invariably, after 20-30 hours of getting hammered, the > > box dies with "DPT: Undocumented error", and drops into > > DDB. If I take out DDB, I get a kazillion messages about > > stale transactions that aren't really dead, and a bunch > > of other errors. If you remove a species you upset the delicate balance of nature :-) Instead, change the lines: default: Debugger("DPT: Undocumented Error"); With: default: Debugger("DPT: Undocumented Error %x", ccb->status_packet.hba_stat); at the end of dpt_process_completion(). This will give us a clue what is the DPT moaning about :-) > I had similar symptoms. I'm running the DPT on a busy > usenet news server. It would never run for more than a day > or two without crashing. Messages about stale transactions > were one of the problems I saw. I never used DDB. I've > never seen the "Undocumented error" diagnostic. > > However, once I upgraded to version 1.2.3 of the DPT driver, > my problems went away (2 months of no problems). But what > happened with that release is that the disk performance > improved enough to allow my news server to keep up with the > news for the first time. This means that every 5 minutes > or so, the incoming feeds completes and the disks get to > "catch up" - i.e. flush their cache. Before version 1.2.3 > the news server could never keep up and was therefor busy > non-stop 24 hours a day. 1.2.4 made some performance improvement and fixed some bugs. It is a (proud) fact that FreeBSD allows very hevey loads to be imposed on the system. Let's debug these... > Are you using the DPT_HANDLE_TIMEOUTS? If you are getting messages > about stale transactions I think you must have it on. Try turing > it off. As I understand it, this makes the DPT driver check > for transactions that take too long to complete. But "Too long" > is just a calculated value that could well be too short for a very > heavilly loaded system. This was one of the problems I was > having. The DPT driver was aborting transactions that took too > long (>20 seconds or so), when it shouldn't have. If it would have > waited a bit long, the transaction would finish fine on their own. > These long waits are side-effect of a large cache on the controller. > I had the full 64Meg cahce on mine. The ``Undocumented Error'' should not crop up, regardless of load. The long delays are not exactly a result of the large cache. They are despite the large cache. There is a starvation condition with certain disks. The only time a large cache slows things down, in in flushing. The bigger, the more to flush the more to wait. > If you are having real problems with lost transactions, then you > might have to leave the option on, and instead try changing > how it calculates the timeout values. > > The only options I'm using are: > > DPT_MEASURE_PERFORMANCE > DPT_TIMEOUT_FACTOR=4 Change this factor to increase the timeout. > > > Simon of course can correct anything wrong I might have > said above.... > > -- > > Curt Welch > http://CurtWelch.Com/ > > (Just trying to take some of the load off of Simon after all > the time he spent helping get my system stable...) > > > --- Sincerely Yours, Simon Shapiro Atlas Telecom Senior Architect 14355 SW Allen Blvd., Suite 130 Beaverton OR 97005 Shimon@i-Connect.Net Voice: 503.799.2313