Date: Mon, 12 Jun 2000 17:20:02 -0700 (PDT) From: "Kenneth D. Merry" <ken@kdm.org> To: freebsd-bugs@FreeBSD.org Subject: Re: i386/19226: SCSI timeouts during heavy load Message-ID: <200006130020.RAA39847@freefall.freebsd.org>
next in thread | raw e-mail | index | archive | help
The following reply was made to PR i386/19226; it has been noted by GNATS. From: "Kenneth D. Merry" <ken@kdm.org> To: Geir Inge Jensen <gij@jk.priv.no> Cc: FreeBSD-gnats-submit@FreeBSD.ORG, gibbs@FreeBSD.ORG Subject: Re: i386/19226: SCSI timeouts during heavy load Date: Mon, 12 Jun 2000 18:12:30 -0600 On Mon, Jun 12, 2000 at 16:42:57 -0700, Geir Inge Jensen wrote: > "Kenneth D. Merry" wrote: > > > > [ Please make sure to CC any response to freebsd-gnats-submit@FreeBSD.ORG > > so your repsonse makes it into the gnats database. ] > > > > On Mon, Jun 12, 2000 at 21:37:17 +0000, gij@jk.priv.no wrote: > > [ ... ] > > > > It would have probably been helpful to include the dmesg output from the > > disks as well, to get a better idea of the configuration. > > da0 at ahc2 bus 0 target 0 lun 0 > da0: <IBM DMVS18M 0220> Fixed Direct Access SCSI-3 device > da0: 80.000MB/s transfers (40.000MHz, offset 31, 16bit), Tagged Queueing Enabled > da0: 17366MB (35566500 512 byte sectors: 255H 63S/T 2213C) [ more of the same ] Thanks! > > > > You've got two SCSI busses connected to the *same* array? Is this > > controller a CMD OEM controller by any chance? > > The array will automatically terminate the bus in the middle. So that you > get 4 disks on each bus. Thats why I tried using only one bus against it > to check for a malfunction in that autosplitter. But we have a lot of > these PowerVaults running fine with Dell PowerEdge 4350 and FreeBSD 3.3. > Some of the components in the PowerVault is made by Eurologic. Ahh, I see. It looks like I was barking up the wrong tree about it being a CMD controller. Oh well, so much for that theory. > > That (the timeout messages) indicates that from the system's perspective, > > the array hasn't returned a read or write request in 60 seconds. So we > > reset it in an attempt to wake it up. > > Yes, I have tried issuing a camcontrol reset/rescan without luck. We have > to reboot the machine to get contact with the disks. Most of the time > we have contact with the system after the error has occured. But once in > a while the system completely locks up (probably a deadlock or something). > > I briefly browsed through some patches for Linux (up until the point where > it works on these systems). There is a lot of changes in the AIC7xxx driver. > The sequencer code has many changes, and they now issue a dummy read to > flush write requests. They apparently had a problem with scanning the same > PCI bus twice (both as a peer and as a child), so they have a fix for that > too. But I am not knowledgeable enough to really tell whats going on. Another difference in the Linux driver is that it doesn't do tagged queueing, FWIW. > > > We also have some success stories: > > > > > > - Run the PowerVault from a single PCI card (ie. remove the other). > > > - Run the PowerVault only from the internal 7899, channel B. > > > > In this configuration, did you have any other SCSI bus connected to the > > PowerVault? > > No, only one bus. Ie. the autosplitter is not in action. However, due to > the fact that this works fine under 3.3 on another system, I don't think > it could be the PowerVaults fault. That splitter works as it's supposed > to (also under Linux). We also get the error if we put in an extra > PCI scsi card in the above setup (two cards, with only one connected to > the PowerVault). Since that also fails, it can't be a defect in the > PowerVault. The only difference between success and failure is that single > idle PCI scsi card! (which suggests a PCI or interrupt problem). Indeed, it could be a PCI, interrupt or chipset type problem. > > > - linux-2.2.14-6.1.1 kernel (provided by Dell) with original HW setup. > > > - linux-2.2.15 kernel with original HW setup. > > > > > > To me, it sounds like a PCI problem (or maybe in the RCC LE chip). It > > > could also be a problem in the AIC7xxx driver, but it even failed with > > > the AHA2940U2W cards (which works fine in our 3.3 systems). But I am > > > only guessing here. However, Linux has obviously found a fix. > > > > I kinda wonder if this RAID array may be a CMD OEM or something. > > There is no RAID controller in it. It has components from Eurologic, but I > don't know if the whole thing is made by them. Have a look at > > http://www.dell.com/us/en/biz/products/spec_scsis_200_storage.htm > > for further information. > > > CMD controllers have trouble when you have multiple luns on the same > > controller in use. The symptoms are very similar to what you're > > describing. > > It's not a CMD. And they don't share the bus. It's being split in two > parts (as you can see in the added dmesg output). Ahh, I assumed the "PowerVault" was a RAID array, when it is only a chassis. So you're right, it can't be the same problem. > > Except for the idle part, this sounds kinda like the CMD problem. > > > > One thing to try is disabling tagged queueing on both ports of the array. > > For example, to disable tagged queueing for the disk da20: > > > > camcontrol negotiate da20 -v -T disable -a > > > > Then try running your tests again, and see if the problem happens again. > > If so, it may be that the array has problems with tagged queueing on > > multiple luns, like the CMD array controllers. > > I can try this, but I doubt it will help. Or has something changed from > 3.3 to 4.0 that requires this? Keep in mind that we use exactly the same > PowerVault with this setup on a lot of 4350's running FreeBSD 3.3. There have been some changes from 3.3 to 4.0, but I don't think there have been changes in the tagged queueing arena that would make any difference. I'm really not sure what your problem is, so I've handed it off to Justin Gibbs <gibbs@FreeBSD.ORG>, who is the author of the Adaptec driver. Hopefully he can help you get to the bottom of it, sorry I can't help more. Ken -- Kenneth Merry ken@kdm.org To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-bugs" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200006130020.RAA39847>
