Date: Mon, 28 Aug 2000 23:32:57 -0600 From: "Kenneth D. Merry" <ken@kdm.org> To: David Gilbert <dgilbert@velocet.ca> Cc: freebsd-SCSI@FreeBSD.ORG Subject: Re: SCSI disconnect with quantum Atlas IV disks. Message-ID: <20000828233257.A35815@panzer.kdm.org> In-Reply-To: <14762.31347.647187.677745@trooper.velocet.net>; from dgilbert@velocet.ca on Mon, Aug 28, 2000 at 10:42:59AM -0400 References: <14762.31347.647187.677745@trooper.velocet.net>
next in thread | previous in thread | raw e-mail | index | archive | help
On Mon, Aug 28, 2000 at 10:42:59 -0400, David Gilbert wrote: > OK... round two. As I mentioned in my posting about RAID... I'm > having trouble with my system disconnecting disks during intense usage > (like the nightly finds that run across the disk). Hmm. > The system is running two 29160 controllers (couldn't buy 2940's) with > LVD cables and terminators (all rated for 160 operation). With 4.1, > we're running at 40Mhz (80 MB/s) ... which is fine, but I'm wondering > if this could be causing part of the problem. > > When might the 160 patch be MFC'd? Justin would know, you may have to mail him directly for a guess. > I've also fired of a query to Quantum to see if there are any firmware > updates... but a search of their knowledge base doesn't indicate any. > > One strange note is that I've found that I have to turn on the drives > slightly after turning on the computer lest they fail to reset > properly. I don't know what causes this, but changing controllers, > cables and terminators hasn't helped (I also have a set of TechRAM > 390F's that exhibit largely the same symptoms). I kinda wonder if your enclosure is underpowered. One thing that can cause drives to behave strangely is if they aren't getting enough power. When drives spin up, and when they're under high seek load, they use a fair bit more power than they do when they're just spinning idle. Are your drives alone on that power supply? If so, you should be able to look at the drive specs for peak current/power usage, and compare that with what your power supply is spec'ed for. > Would it be possible to code some really BIG bus resets and reprobes > into the SCSI layer instead of disconnecting the drive --- maybe even > some tunable parameter --- It would seem sensible that you could do > with some gaps in performance once you've gotten to this level of > problem. The types of errors you're getting, timed out in data-out phase, indicate that a signal is stuck on the SCSI bus. It's not just stuck momentarily, but has been stuck for 60 seconds. I suppose that could be caused by power problems, although it is more often a cabling and termination problem. From your previous message, it sounds like you've looked over the cabling and power a good bit. This sort of problem, though, is most often a cabling issue. If the signal is getting stuck on the bus, there's not a whole lot we can do about it. From your previous message, it looks like we may not be retrying things after sending a bus reset. It looks like Vinum bails out immediately after the bus reset, which likely indicates that it got an I/O error of some sort. The error recovery code is supposed to unconditionally retry a command that comes back up because a bus reset or a BDR was sent. So, in theory, a bus reset, in and of itself, shouldn't cause an error to get propagated back up far enough that Vinum could detect it. (Unless Vinum has some sort of timeout mechanism, and reports that as a fatal I/O error.) Anyway, it looks a little strange to me, but I'm not sure what's going on. I would suggest running this by Justin (directly), and see what he has to say about it. Ken -- Kenneth Merry ken@kdm.org To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-scsi" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20000828233257.A35815>