Date: Fri, 25 Aug 2000 10:42:44 -0400 (EDT) From: David Gilbert <dgilbert@velocet.ca> To: Greg Lehey <grog@lemis.com> Cc: David Gilbert <dgilbert@velocet.ca>, freebsd-scsi@FreeBSD.ORG Subject: Re: Vinum 29160 detaches drives, invalidates RAID. Message-ID: <14758.34276.167320.197675@trooper.velocet.net> In-Reply-To: <20000825113638.D39208@wantadilla.lemis.com> References: <14757.14569.732766.367692@trooper.velocet.net> <20000825113638.D39208@wantadilla.lemis.com>
next in thread | previous in thread | raw e-mail | index | archive | help
>>>>> "Greg" == Greg Lehey <grog@lemis.com> writes: >> First of all, I'm very pleased with the speed. The system easily >> beats the AMI MegaRAID 1500 (same drives) with a whopping 35Mbyte/s >> in RAID-5 (vs. the 1500's 14Mbyte/s) for read. (They both score a >> dead heat of 4Mbyte/s write.) Greg> Nice to hear :-) In general, I'm an advocate of the vinum system. I've been hammering it for months now on the test RAID-5 system. Besides this disconnecting problem, the system is performing very well. >> Now... if I reboot, and "vinum setstate up" all these drives, Greg> They all go down, do they? Not all... Sometimes 2 sometimes 4. I suppose I should have said that I setstate up all the drives that are down. >> fsck completes without any complaint. I then generally have to >> "vinum rebuild parity" ... but I suppose that I'd expect that. Greg> Hmm. rebuildparity is a dangerous command. Basically, a parity Greg> error means that *one* (or more) of the drives has incorrect Greg> data. rebuildparity simply assumes that the error is in the Greg> data block and "corrects" it. It's a serious problem, one that Greg> is very difficult to solve. Well... at the point of failure, we're doing the nightly finds on the disk. I do the fsck (usually) before I do the rebuildparity. I suspect that the only information being written to the disk at this point is the access time updates. I would expect, then, that corrupt data is likely limited to an update of this nature. >> The problem I'm having here (and I've had it before) is that the >> FreeBSD SCSI system seems to "give up" under conditions that others >> would keep retrying or resetting/retrying. >> It seems really, really, really important to me that we try harder >> to get a drive back online. This seems as if it could affect the >> long-term viability of a vinum-based raid server... not because >> vinum is bad, but because the SCSI subsystem is too fragile. Greg> Hmm. I can't really comment on that, but it would be nice if Greg> the SCSI system could recover from these problems. I think this is a critical thing. I can accept that it may be hard to discern if the device has been yanked from the bus or had gone into some other bad state --- but this is definately not the case. The FreeBSD SCSI subsystem as-it-stands is very fragile. I realize that cabling must be 100% for many different reasons; ... But by the same token, we need things to keep retrying and resetting far longer before loosing all hope. Dave. -- ============================================================================ |David Gilbert, Velocet Communications. | Two things can only be | |Mail: dgilbert@velocet.net | equal if and only if they | |http://www.velocet.net/~dgilbert | are precisely opposite. | =========================================================GLO================ To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-scsi" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?14758.34276.167320.197675>