Date: Tue, 02 Jan 2001 22:03:58 -0600 From: David Kelly <dkelly@hiwaay.net> To: Greg Lehey <grog@lemis.com> Cc: Josef Karthauser <joe@tao.org.uk>, Matraquilla@cs.com, Roman Shterenzon <roman@harmonic.co.il>, freebsd-stable@FreeBSD.ORG Subject: Re: RAID-5 reliability (was: vinum malfunction!) Message-ID: <200101030403.f0343wp03856@grumpy.dyndns.org> In-Reply-To: Message from Greg Lehey <grog@lemis.com> of "Wed, 03 Jan 2001 10:36:21 %2B1030." <20010103103621.G40453@wantadilla.lemis.com>
next in thread | previous in thread | raw e-mail | index | archive | help
Greg Lehey writes: > On Tuesday, 2 January 2001 at 14:06:16 +0000, Josef Karthauser wrote: [...] > > > I don't know what it's going to take for Greg to get enough > > information to fix the problem. I spent a week trying to extract a > > set of debug information for him that was useful enough that he > > could work from it, but it seems that my week was just wasted > > because it looks like I didn't capture the bug that he was expecting > > :(. > > Well, it's not wasted, but it didn't help enough. As I said, it's > elusive. A point I'd like to make is that efforts to hunt down a problem (you don't know its a bug until you find it) are never wasted. Even if you don't find the problem. The only time such an effort is wasted is when all the effort is directed at (say) software, and you find the cleaning lady was unplugging your disc array every night. One day Greg may be looking at yet another stack trace without enough evidence to locate the problem, but something about the stack traces from Josef and others may click (light bulb appears over Greg's head) and provide the final clue. May be related, may not, but last month I found my new Promise ATA-100 card did not like to share interupts with my other toys. Had to be stressed to fail. And failed in the stangest ways with xterms disappearing, sometimes tcsh core dumped. About 15 minutes later the whole system rolled over on its back. How robust is PCI IRQ sharing? I would never have expected such a problem. But I moved that card to an IRQ of its own and was never able to repeat the problem. Another point I'd like to make is that its a myth that "hardware" is immune to such problems. "Hardware RAID" isn't really hardware so much as its a dedicated embedded CPU. Limited tasks to perform with total control of the environment. By comparison the FreeBSD kernel has a lot of things going on. Anyway, if "hardware" were immune to such problems then there would be little need for FLASH memory, BIOS updates, HD firmware updates, or even RAID firmware updates. Then again the advantage of the embedded system is the limited scope of outside influences, which should result in a stable and reliable system easier than one with higher limits. -- David Kelly N4HHE, dkelly@hiwaay.net ===================================================================== The human mind ordinarily operates at only ten percent of its capacity -- the rest is overhead for the operating system. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-stable" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200101030403.f0343wp03856>