Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 20 Dec 1998 19:47:51 -0700 (MST)
From:      "Kenneth D. Merry" <ken@plutotech.com>
To:        skynyrd@opus.cts.cwu.edu (Chris Timmons)
Cc:        asmodai@wxs.nl, freebsd-scsi@FreeBSD.ORG
Subject:   Re: Problem with SCSI-bus and high diskaccess?
Message-ID:  <199812210247.TAA94760@panzer.plutotech.com>
In-Reply-To: <Pine.BSF.3.96.981220122334.9871A-100000@opus.cts.cwu.edu> from Chris Timmons at "Dec 20, 98 12:34:22 pm"

next in thread | previous in thread | raw e-mail | index | archive | help
Chris Timmons wrote...
> 
> [moved to -scsi]
> 
> I am assuming that you posted to -current because you just cvsupped the
> latest 3.0-current bits and are running that.  Else freebsd-scsi would be
> a good list for this kind of problem (be sure to say what version of
> FreeBSD you are running.) 

Yep, it's best to post to -scsi with SCSI problems.

> Although I haven't seen this sort of thing with a fireball, it reeks of
> quantum firmware.  I have had similar problems with atlas-I and atlas-II
> drives.  So you will want to check your firmware revision and see if there
> is something newer out at ftp.quantum.com (upgrading firmware is a fun way
> to waste a day.)  You might try www.dejanews.com and search for your drive
> name and firmware revision.  Chances are somebody else has already seen
> and documented a similar problem if it is indeed the drives. 

Yes, it looks like a firmware issue, most likely.

> There is also a slight possibility that you are using an old enough
> version of FreeBSD and perhaps an integrated aha-2940UW on your mb (or
> perhaps the dreaded rev e(?) of the pci card.  In that case, Justin
> committed fixes to the drivers months ago.  Update your system.

True, that could be it.  From his later message, though, we know that
that isn't the problem.

> I find that IBM and Seagate drives work very well and come with firmware

Yep, they definitely seem to write better firmware.

> On Sun, 20 Dec 1998, Jeroen Ruigrok/Asmodai wrote:
> 
> > Hi,
> > 
> > I just want some thoughts on this:
> > 
> > In the last 24 hours my workstation is been going nuts, whereas it has been
> > running along nicely for weeks before.
> > 
> > The things that kept me bugging today were these happy messages:
> > 
> > Dec 20 10:51:14 chronias /kernel: Unexpected busfree.  LASTPHASE == 0xa0
> > Dec 20 10:51:15 chronias /kernel: SEQADDR == 0x157
> > Dec 20 10:51:15 chronias /kernel: (da1:ahc0:0:1:0): SCB 0x5 - timed out while
> > idle, LASTPHASE == 0x1, SEQADDR == 0xc
> > Dec 20 10:51:15 chronias /kernel: (da1:ahc0:0:1:0): Queuing a BDR SCB
> > Dec 20 10:51:15 chronias /kernel: (da1:ahc0:0:1:0): Bus Device Reset Message Sent
> > Dec 20 10:51:15 chronias /kernel: (da1:ahc0:0:1:0): no longer in timeout, status
> > = 353

This indicates that the drive is having trouble.  The timed out while idle
problem generally happens when the drive doesn't respond to a command in
the specified period of time.  The default read/write timeout in the da
driver is 60 seconds, which should be way more than enough time for the
drive to respond.

When the drive timed out, we hit it over the head with a BDR to wake it up.
Usually, that will wake the drive up and get things going again.

> > After which page and swap process were running wild.
> > 
> > The weird thing is, these HD's (two SCSI Quantum Fireballs) have been checked, the
> > HA (AHA 2940UW) is likewise in good shape and the memory chips have been tested as
> > well... The mainboard is also in good shape and is cooled by a few fans (3) which
> > all work, including the one on the CPU.
> > 
> > The circumstances when this happened was when I did a locate.updatedb at the same
> > time as downloading a PDF file to the HD, and then I opened a mailbox at which
> > time the whole time went haywire...
> > 
> > Unfortunately I wasn't able to write down the pager messages since they went by
> > with warpspeed 9 and they weren't logged in /var/log/messages =\
> > 
> > Does anybody have any ideas what to check to narrow down the problem?

The problem is probably just: Quantum Firmware + High Load == Drive goes
out to lunch.

With Quantum disks, the problem generally happens under high load.  Chris
is right, the Atlas I and Atlas II had problems like this as well.  The
latest firmware for the Atlas II at least has solved some of the problems,
but not all.

I believe the LYK8 Atlas II firmware has mostly solved the "drive goes out
to lunch" problem.  It hasn't solved the problem that causes it to
continually return queue full until we have reduced the number of tags to
the minimum.  That's why we have Atlas II quirk entries setting the minimum
number of tags to 24.

I know that we (Pluto) have had trouble with the Fireball ST drives in the
past in certain situations.  (can't remember exactly what those situations
were)  I believe, though, that the '0F0J' firmware worked reasonably well.

Ken
-- 
Kenneth Merry
ken@plutotech.com

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-scsi" in the body of the message



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199812210247.TAA94760>