Date: Wed, 5 Jan 2000 22:05:11 -0700 From: "Kenneth D. Merry" <ken@kdm.org> To: Mike Smith <msmith@FreeBSD.ORG> Cc: Bill Swingle <unfurl@dub.net>, scsi@FreeBSD.ORG, gibbs@FreeBSD.ORG Subject: Re: Invalidating pack messages Message-ID: <20000105220511.A31109@panzer.kdm.org> In-Reply-To: <200001042004.MAA01667@mass.cdrom.com>; from msmith@FreeBSD.ORG on Tue, Jan 04, 2000 at 12:04:42PM -0800 References: <20000103225224.B10024@panzer.kdm.org> <200001042004.MAA01667@mass.cdrom.com>
next in thread | previous in thread | raw e-mail | index | archive | help
[-- Attachment #1 --]
On Tue, Jan 04, 2000 at 12:04:42 -0800, Mike Smith wrote:
> > > Then about 10 minutes later with no other errors inbetween, it barfed
> > > this several times:
> > >
> > > (da0:ahc0:0:0:0): Invalidating pack
> >
> > I'm not sure why it would spit that out multiple times. The "Invalidating
> > pack" message is issued by the da driver when it gets an ENXIO error from
> > the error recovery code. This can happen if the retry count is exhausted
> > on one of several sense codes (you can search through scsi_all.c for ENXIO)
> > or if the retry count is exhaused on selection timeouts.
>
> Perhaps we might get one message per SCB that's failed recovery?
That's probably it, since you've probably got multiple transactions
outstanding when the problem occurs.
> > > Here are the boot messages from the drives/controller:
> > >
> > > ahc0: <Adaptec aic7896/97 Ultra2 SCSI adapter> irq 23 at device 9.0 on pci1
> > > ahc0: aic7896/97 Wide Channel A, SCSI Id=7, 16/255 SCBs
> > > ahc1: <Adaptec aic7896/97 Ultra2 SCSI adapter> irq 23 at device 9.1 on pci1
> > > ahc1: aic7896/97 Wide Channel B, SCSI Id=7, 16/255 SCBs
> >
> > I take it this is an onboard controller? What kind of motherboard is it?
> > (Intel or AMI? I recall any other quad Xeon boards.)
>
> It's actually an Acer X5; the onboard part is an aic7896N
>
> > Second, you might want to put your LVD devices on one bus, and your single
> > ended devices on another, so you can get LVD speeds out of the LVD devices.
> > That is, unless you've got a 3860 bridge on there, so you can run LVD and
> > SE on the same bus. (Unlikely, since you've got a 7896.)
>
> There is indeed a 3860 in there. For some odd reason, the low-speed bus
> is bridged off bus A, not bus B on the 7896.
>
> > Third, make sure you check your cabling and termination. Remember that LVD
> > drives don't have terminators, so you have to use a SE device to terminate
> > the chain on a SE bus, or use the twisty LVD cables with terminator blocks
> > on the end.
>
> The box is as-built by Acer; the internal LVD cabling is all OK, the
> hotswap bay units are properly terminated and the SE cable has a crimp-on
> terminator at the end.
My guess here is that you're running into selection timeouts. (Especially
since Bill's later mail didn't show any additional error messages.) If you
apply the attached patch to cam_periph.c, it should print out an error
message when you get a fatal selection timeout.
This could be related to the sync rate problems you're having. It could be
the device just going out to lunch. It could be cabling or termination
somehow, if the cabling problem would cause intermittent selection
problems.
I think the best course to take on this problem is for Justin to try to
tackle your sync rate problem first. The sync rate problem is most likely
an Adaptec driver interaction problem with your particular motherboard and
7896 implementation.
Once he gets that fixed, we can see if you're still having the problems.
If so, they'll be narrowed to a bogus drive, bad cabling/terminaton, or
some other problem I haven't thought of. :) (What good is an answer if you
don't hedge it? :)
Ken
--
Kenneth Merry
ken@kdm.org
[-- Attachment #2 --]
==== //depot/FreeBSD-ken3/src/sys/cam/cam_periph.c#1 - /a/ken/perforce/FreeBSD-ken3/src/sys/cam/cam_periph.c ====
*** /tmp/tmp.31211.0 Wed Jan 5 22:02:29 2000
--- /a/ken/perforce/FreeBSD-ken3/src/sys/cam/cam_periph.c Wed Jan 5 22:02:12 2000
***************
*** 1607,1612 ****
--- 1607,1617 ----
} else {
error = ENXIO;
}
+
+ if (error == ENXIO) {
+ xpt_print_path(periph->path);
+ printf("got selection timeout\n");
+ }
break;
}
case CAM_REQ_INVALID:
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20000105220511.A31109>
