From owner-freebsd-scsi Wed Jan 5 21: 5:16 2000 Delivered-To: freebsd-scsi@freebsd.org Received: from panzer.kdm.org (panzer.kdm.org [216.160.178.169]) by hub.freebsd.org (Postfix) with ESMTP id ADDC815509; Wed, 5 Jan 2000 21:05:12 -0800 (PST) (envelope-from ken@panzer.kdm.org) Received: (from ken@localhost) by panzer.kdm.org (8.9.3/8.9.1) id WAA31232; Wed, 5 Jan 2000 22:05:11 -0700 (MST) (envelope-from ken) Date: Wed, 5 Jan 2000 22:05:11 -0700 From: "Kenneth D. Merry" To: Mike Smith Cc: Bill Swingle , scsi@FreeBSD.ORG, gibbs@FreeBSD.ORG Subject: Re: Invalidating pack messages Message-ID: <20000105220511.A31109@panzer.kdm.org> References: <20000103225224.B10024@panzer.kdm.org> <200001042004.MAA01667@mass.cdrom.com> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="SUOF0GtieIMvvwua" X-Mailer: Mutt 1.0i In-Reply-To: <200001042004.MAA01667@mass.cdrom.com>; from msmith@FreeBSD.ORG on Tue, Jan 04, 2000 at 12:04:42PM -0800 Sender: owner-freebsd-scsi@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org --SUOF0GtieIMvvwua Content-Type: text/plain; charset=us-ascii On Tue, Jan 04, 2000 at 12:04:42 -0800, Mike Smith wrote: > > > Then about 10 minutes later with no other errors inbetween, it barfed > > > this several times: > > > > > > (da0:ahc0:0:0:0): Invalidating pack > > > > I'm not sure why it would spit that out multiple times. The "Invalidating > > pack" message is issued by the da driver when it gets an ENXIO error from > > the error recovery code. This can happen if the retry count is exhausted > > on one of several sense codes (you can search through scsi_all.c for ENXIO) > > or if the retry count is exhaused on selection timeouts. > > Perhaps we might get one message per SCB that's failed recovery? That's probably it, since you've probably got multiple transactions outstanding when the problem occurs. > > > Here are the boot messages from the drives/controller: > > > > > > ahc0: irq 23 at device 9.0 on pci1 > > > ahc0: aic7896/97 Wide Channel A, SCSI Id=7, 16/255 SCBs > > > ahc1: irq 23 at device 9.1 on pci1 > > > ahc1: aic7896/97 Wide Channel B, SCSI Id=7, 16/255 SCBs > > > > I take it this is an onboard controller? What kind of motherboard is it? > > (Intel or AMI? I recall any other quad Xeon boards.) > > It's actually an Acer X5; the onboard part is an aic7896N > > > Second, you might want to put your LVD devices on one bus, and your single > > ended devices on another, so you can get LVD speeds out of the LVD devices. > > That is, unless you've got a 3860 bridge on there, so you can run LVD and > > SE on the same bus. (Unlikely, since you've got a 7896.) > > There is indeed a 3860 in there. For some odd reason, the low-speed bus > is bridged off bus A, not bus B on the 7896. > > > Third, make sure you check your cabling and termination. Remember that LVD > > drives don't have terminators, so you have to use a SE device to terminate > > the chain on a SE bus, or use the twisty LVD cables with terminator blocks > > on the end. > > The box is as-built by Acer; the internal LVD cabling is all OK, the > hotswap bay units are properly terminated and the SE cable has a crimp-on > terminator at the end. My guess here is that you're running into selection timeouts. (Especially since Bill's later mail didn't show any additional error messages.) If you apply the attached patch to cam_periph.c, it should print out an error message when you get a fatal selection timeout. This could be related to the sync rate problems you're having. It could be the device just going out to lunch. It could be cabling or termination somehow, if the cabling problem would cause intermittent selection problems. I think the best course to take on this problem is for Justin to try to tackle your sync rate problem first. The sync rate problem is most likely an Adaptec driver interaction problem with your particular motherboard and 7896 implementation. Once he gets that fixed, we can see if you're still having the problems. If so, they'll be narrowed to a bogus drive, bad cabling/terminaton, or some other problem I haven't thought of. :) (What good is an answer if you don't hedge it? :) Ken -- Kenneth Merry ken@kdm.org --SUOF0GtieIMvvwua Content-Type: text/plain; charset=us-ascii Content-Disposition: attachment; filename="cam_periph.c.selto.010500" ==== //depot/FreeBSD-ken3/src/sys/cam/cam_periph.c#1 - /a/ken/perforce/FreeBSD-ken3/src/sys/cam/cam_periph.c ==== *** /tmp/tmp.31211.0 Wed Jan 5 22:02:29 2000 --- /a/ken/perforce/FreeBSD-ken3/src/sys/cam/cam_periph.c Wed Jan 5 22:02:12 2000 *************** *** 1607,1612 **** --- 1607,1617 ---- } else { error = ENXIO; } + + if (error == ENXIO) { + xpt_print_path(periph->path); + printf("got selection timeout\n"); + } break; } case CAM_REQ_INVALID: --SUOF0GtieIMvvwua-- To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-scsi" in the body of the message