Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 28 Aug 2000 23:32:57 -0600
From:      "Kenneth D. Merry" <ken@kdm.org>
To:        David Gilbert <dgilbert@velocet.ca>
Cc:        freebsd-SCSI@FreeBSD.ORG
Subject:   Re: SCSI disconnect with quantum Atlas IV disks.
Message-ID:  <20000828233257.A35815@panzer.kdm.org>
In-Reply-To: <14762.31347.647187.677745@trooper.velocet.net>; from dgilbert@velocet.ca on Mon, Aug 28, 2000 at 10:42:59AM -0400
References:  <14762.31347.647187.677745@trooper.velocet.net>

next in thread | previous in thread | raw e-mail | index | archive | help
On Mon, Aug 28, 2000 at 10:42:59 -0400, David Gilbert wrote:
> OK... round two.  As I mentioned in my posting about RAID... I'm
> having trouble with my system disconnecting disks during intense usage 
> (like the nightly finds that run across the disk).

Hmm.

> The system is running two 29160 controllers (couldn't buy 2940's) with 
> LVD cables and terminators (all rated for 160 operation).  With 4.1,
> we're running at 40Mhz (80 MB/s) ... which is fine, but I'm wondering
> if this could be causing part of the problem.
> 
> When might the 160 patch be MFC'd?

Justin would know, you may have to mail him directly for a guess.

> I've also fired of a query to Quantum to see if there are any firmware 
> updates... but a search of their knowledge base doesn't indicate any.
> 
> One strange note is that I've found that I have to turn on the drives
> slightly after turning on the computer lest they fail to reset
> properly.  I don't know what causes this, but changing controllers,
> cables and terminators hasn't helped (I also have a set of TechRAM
> 390F's that exhibit largely the same symptoms).

I kinda wonder if your enclosure is underpowered.  One thing that can cause
drives to behave strangely is if they aren't getting enough power.

When drives spin up, and when they're under high seek load, they use
a fair bit more power than they do when they're just spinning idle.

Are your drives alone on that power supply?  If so, you should be able to
look at the drive specs for peak current/power usage, and compare that with
what your power supply is spec'ed for.

> Would it be possible to code some really BIG bus resets and reprobes
> into the SCSI layer instead of disconnecting the drive --- maybe even
> some tunable parameter --- It would seem sensible that you could do
> with some gaps in performance once you've gotten to this level of
> problem.

The types of errors you're getting, timed out in data-out phase, indicate
that a signal is stuck on the SCSI bus.  It's not just stuck momentarily,
but has been stuck for 60 seconds.

I suppose that could be caused by power problems, although it is more often
a cabling and termination problem.

From your previous message, it sounds like you've looked over the cabling
and power a good bit.

This sort of problem, though, is most often a cabling issue.  If the signal
is getting stuck on the bus, there's not a whole lot we can do about it.

From your previous message, it looks like we may not be retrying things
after sending a bus reset.  It looks like Vinum bails out immediately after
the bus reset, which likely indicates that it got an I/O error of some
sort.

The error recovery code is supposed to unconditionally retry a command that
comes back up because a bus reset or a BDR was sent.  So, in theory, a bus
reset, in and of itself, shouldn't cause an error to get propagated back up
far enough that Vinum could detect it.  (Unless Vinum has some sort of
timeout mechanism, and reports that as a fatal I/O error.)

Anyway, it looks a little strange to me, but I'm not sure what's going on.
I would suggest running this by Justin (directly), and see what he has to
say about it.

Ken
-- 
Kenneth Merry
ken@kdm.org


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-scsi" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20000828233257.A35815>