Date: Wed, 20 Aug 1997 14:44:13 +0930 From: Greg Lehey <grog@lemis.com> To: "Justin T. Gibbs" <gibbs@plutotech.com> Cc: FreeBSD SCSI Mailing List <freebsd-scsi@freebsd.org> Subject: Re: Bus resets. Grrrr. Message-ID: <19970820144413.04972@lemis.com> In-Reply-To: <199708200320.VAA08483@pluto.plutotech.com>; from Justin T. Gibbs on Tue, Aug 19, 1997 at 09:19:59PM -0600 References: <19970820090810.54774@lemis.com> <199708200320.VAA08483@pluto.plutotech.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, Aug 19, 1997 at 09:19:59PM -0600, Justin T. Gibbs wrote: >> On Tue, Aug 19, 1997 at 10:53:54AM -0600, Justin T. Gibbs wrote: >>>>> What version of the kernel are you using >>>> >>>> Recent versions of -current. The ones I reported it against were some >>>> time last week. I've just rebuilt with a version supped this morning. >>> >>> And it is still reproducible? >> >> I changed the configuration file and added (inter alia) >> AHC_SCBPAGING_ENABLE. The resultant kernel hung solid three times in >> the course of a couple of hours, once with a disk activity light on >> solid, and the other two without. I removed AHC_SCBPAGING_ENABLE, and >> last night the backup went through for the first time in a week. It >> ran fine until last Wednesday, however, so this could be a >> coincidence. > > The system hung solid with no kernel messages or were you in X so > you couldn't see them? That's right. > There is no guarantee that driver messages will make it into the log > file if the SCSI bus is wedged. Well, in this case the system disk is IDE, so I could put /var there and try again (currently, I note, it's on /dev/sd1h). > I wasn't aware of any problems with SCB paging, so I'd be very > interrested in any information you can provide on this problem. In > most cases, BTW, SCB paging isn't a win unless you are also using > tagged queuing (option AHC_TAGENABLE). Well, it looks like that wasn't the problem. It has happened again, and I wasn't able to get a dump (despite the kernel debugger; I must investigate why), so I don't suppose it was that option after all. >> No, at the moment the chain only has four devices connected, but I >> notice it finds two LUNs for the tape changer: > > Ahh. I thought you said you had a 2940A, not a 2940. Oops. So I did. I thought it was one. > This pretty much rules out the QOUTFIFO overflow problem (assuming > you are not using tagged queueing, which your dmesg output seems to > confirm) since the aic7870 has 16 slots meaning 9 active devices > would be necessary. OK. >> ahc0: <Adaptec 2940 SCSI host adapter> rev 0x03 int a irq 12 on pci0.18.0 >> ahc0: aic7870 Single Channel, SCSI Id=7, 16 SCBs >> ahc0: waiting for scsi devices to settle >> scbus0 at ahc0 bus 0 >> scbus0 target 0 lun 0: <MICROP 2112-15MQ1094802 HQ48> type 0 fixed SCSI 2 >> sd0 at scbus0 target 0 lun 0 >> sd0: Direct-Access 1001MB (2051615 512 byte sectors) >> sd0: with 1760 cyls, 15 heads, and an average 77 sectors/track > > It looks like there is newer firmware available for this drive from > Micropolis: > > ftp://techsupport.micropolis.com/pub/files/firmware/Aquaris/2105-2108-2112/4930010f.bin > ftp://techsupport.micropolis.com/pub/files/Utils/ASPIUTIL.EXE OK, I'll take a look. >>> Could it be that you don't have disconnections enabled for your tape drive? >>> You should check both SCSI-Select for the 2940 and any relevant jumpers >>> on the tape drive itself. If disconnections are disabled, a tape write that >>> required multiple retries could easily tie up the SCSI bus for the 10s >>> needed to make a disk command time out. >> >> You'd see that on the activity light, right? In any case, the host >> adapter is set correctly, and the tape doesn't seem to have any such >> config switch. Would there be another way to test that? > > Not really. Since the timeout was "while idle", chances are that > disconnection is enabled and working. Yes, that's reasonable. >>> The first one probably fails because the device isn't ready. >> >> That's what I thought, too, so I put a sleep 30 into the script. It >> still works the second time. > > Then it probably fails because there is a unit attention that needs to > be cleared. The console error message would be enough to determine > what is really happening. OK, I'll try to capture it next time. Greg
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?19970820144413.04972>