From owner-freebsd-scsi Fri Aug 25 7:42:52 2000 Delivered-To: freebsd-scsi@freebsd.org Received: from sabre.velocet.net (sabre.velocet.net [198.96.118.66]) by hub.freebsd.org (Postfix) with ESMTP id 9223B37B424 for ; Fri, 25 Aug 2000 07:42:50 -0700 (PDT) Received: from office.tor.velocet.net (trooper.velocet.net [216.126.82.226]) by sabre.velocet.net (Postfix) with ESMTP id B682C137F56; Fri, 25 Aug 2000 10:42:44 -0400 (EDT) Received: (from dgilbert@localhost) by office.tor.velocet.net (8.9.3/8.9.3) id KAA93753; Fri, 25 Aug 2000 10:42:44 -0400 (EDT) (envelope-from dgilbert) From: David Gilbert MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <14758.34276.167320.197675@trooper.velocet.net> Date: Fri, 25 Aug 2000 10:42:44 -0400 (EDT) To: Greg Lehey Cc: David Gilbert , freebsd-scsi@FreeBSD.ORG Subject: Re: Vinum 29160 detaches drives, invalidates RAID. In-Reply-To: <20000825113638.D39208@wantadilla.lemis.com> References: <14757.14569.732766.367692@trooper.velocet.net> <20000825113638.D39208@wantadilla.lemis.com> X-Mailer: VM 6.75 under 20.4 "Emerald" XEmacs Lucid Sender: owner-freebsd-scsi@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org >>>>> "Greg" == Greg Lehey writes: >> First of all, I'm very pleased with the speed. The system easily >> beats the AMI MegaRAID 1500 (same drives) with a whopping 35Mbyte/s >> in RAID-5 (vs. the 1500's 14Mbyte/s) for read. (They both score a >> dead heat of 4Mbyte/s write.) Greg> Nice to hear :-) In general, I'm an advocate of the vinum system. I've been hammering it for months now on the test RAID-5 system. Besides this disconnecting problem, the system is performing very well. >> Now... if I reboot, and "vinum setstate up" all these drives, Greg> They all go down, do they? Not all... Sometimes 2 sometimes 4. I suppose I should have said that I setstate up all the drives that are down. >> fsck completes without any complaint. I then generally have to >> "vinum rebuild parity" ... but I suppose that I'd expect that. Greg> Hmm. rebuildparity is a dangerous command. Basically, a parity Greg> error means that *one* (or more) of the drives has incorrect Greg> data. rebuildparity simply assumes that the error is in the Greg> data block and "corrects" it. It's a serious problem, one that Greg> is very difficult to solve. Well... at the point of failure, we're doing the nightly finds on the disk. I do the fsck (usually) before I do the rebuildparity. I suspect that the only information being written to the disk at this point is the access time updates. I would expect, then, that corrupt data is likely limited to an update of this nature. >> The problem I'm having here (and I've had it before) is that the >> FreeBSD SCSI system seems to "give up" under conditions that others >> would keep retrying or resetting/retrying. >> It seems really, really, really important to me that we try harder >> to get a drive back online. This seems as if it could affect the >> long-term viability of a vinum-based raid server... not because >> vinum is bad, but because the SCSI subsystem is too fragile. Greg> Hmm. I can't really comment on that, but it would be nice if Greg> the SCSI system could recover from these problems. I think this is a critical thing. I can accept that it may be hard to discern if the device has been yanked from the bus or had gone into some other bad state --- but this is definately not the case. The FreeBSD SCSI subsystem as-it-stands is very fragile. I realize that cabling must be 100% for many different reasons; ... But by the same token, we need things to keep retrying and resetting far longer before loosing all hope. Dave. -- ============================================================================ |David Gilbert, Velocet Communications. | Two things can only be | |Mail: dgilbert@velocet.net | equal if and only if they | |http://www.velocet.net/~dgilbert | are precisely opposite. | =========================================================GLO================ To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-scsi" in the body of the message