Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 20 Aug 1997 14:44:13 +0930
From:      Greg Lehey <grog@lemis.com>
To:        "Justin T. Gibbs" <gibbs@plutotech.com>
Cc:        FreeBSD SCSI Mailing List <freebsd-scsi@freebsd.org>
Subject:   Re: Bus resets. Grrrr.
Message-ID:  <19970820144413.04972@lemis.com>
In-Reply-To: <199708200320.VAA08483@pluto.plutotech.com>; from Justin T. Gibbs on Tue, Aug 19, 1997 at 09:19:59PM -0600
References:  <19970820090810.54774@lemis.com> <199708200320.VAA08483@pluto.plutotech.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, Aug 19, 1997 at 09:19:59PM -0600, Justin T. Gibbs wrote:
>> On Tue, Aug 19, 1997 at 10:53:54AM -0600, Justin T. Gibbs wrote:
>>>>> What version of the kernel are you using
>>>>
>>>> Recent versions of -current.  The ones I reported it against were some
>>>> time last week.  I've just rebuilt with a version supped this morning.
>>>
>>> And it is still reproducible?
>>
>> I changed the configuration file and added (inter alia)
>> AHC_SCBPAGING_ENABLE.  The resultant kernel hung solid three times in
>> the course of a couple of hours, once with a disk activity light on
>> solid, and the other two without.  I removed AHC_SCBPAGING_ENABLE, and
>> last night the backup went through for the first time in a week.  It
>> ran fine until last Wednesday, however, so this could be a
>> coincidence.
>
> The system hung solid with no kernel messages or were you in X so
> you couldn't see them? 

That's right.

> There is no guarantee that driver messages will make it into the log
> file if the SCSI bus is wedged.

Well, in this case the system disk is IDE, so I could put /var there
and try again (currently, I note, it's on /dev/sd1h).

> I wasn't aware of any problems with SCB paging, so I'd be very
> interrested in any information you can provide on this problem.  In
> most cases, BTW, SCB paging isn't a win unless you are also using
> tagged queuing (option AHC_TAGENABLE).

Well, it looks like that wasn't the problem.  It has happened again,
and I wasn't able to get a dump (despite the kernel debugger; I must
investigate why), so I don't suppose it was that option after all.

>> No, at the moment the chain only has four devices connected, but I
>> notice it finds two LUNs for the tape changer:
>
> Ahh.  I thought you said you had a 2940A, not a 2940.

Oops.  So I did.  I thought it was one.

> This pretty much rules out the QOUTFIFO overflow problem (assuming
> you are not using tagged queueing, which your dmesg output seems to
> confirm) since the aic7870 has 16 slots meaning 9 active devices
> would be necessary.

OK.

>> ahc0: <Adaptec 2940 SCSI host adapter> rev 0x03 int a irq 12 on pci0.18.0
>> ahc0: aic7870 Single Channel, SCSI Id=7, 16 SCBs
>> ahc0: waiting for scsi devices to settle
>> scbus0 at ahc0 bus 0
>> scbus0 target 0 lun 0: <MICROP 2112-15MQ1094802 HQ48> type 0 fixed SCSI 2
>> sd0 at scbus0 target 0 lun 0
>> sd0: Direct-Access 1001MB (2051615 512 byte sectors)
>> sd0: with 1760 cyls, 15 heads, and an average 77 sectors/track
>
> It looks like there is newer firmware available for this drive from
> Micropolis:
>
> ftp://techsupport.micropolis.com/pub/files/firmware/Aquaris/2105-2108-2112/4930010f.bin
> ftp://techsupport.micropolis.com/pub/files/Utils/ASPIUTIL.EXE

OK, I'll take a look.

>>> Could it be that you don't have disconnections enabled for your tape drive?
>>> You should check both SCSI-Select for the 2940 and any relevant jumpers
>>> on the tape drive itself.  If disconnections are disabled, a tape write that
>>> required multiple retries could easily tie up the SCSI bus for the 10s
>>> needed to make a disk command time out.
>>
>> You'd see that on the activity light, right?  In any case, the host
>> adapter is set correctly, and the tape doesn't seem to have any such
>> config switch.  Would there be another way to test that?
>
> Not really.  Since the timeout was "while idle", chances are that
> disconnection is enabled and working.

Yes, that's reasonable.

>>> The first one probably fails because the device isn't ready.
>>
>> That's what I thought, too, so I put a sleep 30 into the script.  It
>> still works the second time.
>
> Then it probably fails because there is a unit attention that needs to
> be cleared.  The console error message would be enough to determine
> what is really happening.

OK, I'll try to capture it next time.

Greg



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?19970820144413.04972>