Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 02 Jan 2001 22:03:58 -0600
From:      David Kelly <dkelly@hiwaay.net>
To:        Greg Lehey <grog@lemis.com>
Cc:        Josef Karthauser <joe@tao.org.uk>, Matraquilla@cs.com, Roman Shterenzon <roman@harmonic.co.il>, freebsd-stable@FreeBSD.ORG
Subject:   Re: RAID-5 reliability (was: vinum malfunction!) 
Message-ID:  <200101030403.f0343wp03856@grumpy.dyndns.org>
In-Reply-To: Message from Greg Lehey <grog@lemis.com>  of "Wed, 03 Jan 2001 10:36:21 %2B1030." <20010103103621.G40453@wantadilla.lemis.com> 

next in thread | previous in thread | raw e-mail | index | archive | help
Greg Lehey writes:
> On Tuesday,  2 January 2001 at 14:06:16 +0000, Josef Karthauser wrote:
[...]
> 
> > I don't know what it's going to take for Greg to get enough
> > information to fix the problem.  I spent a week trying to extract a
> > set of debug information for him that was useful enough that he
> > could work from it, but it seems that my week was just wasted
> > because it looks like I didn't capture the bug that he was expecting
> > :(.
> 
> Well, it's not wasted, but it didn't help enough.  As I said, it's
> elusive.

A point I'd like to make is that efforts to hunt down a problem (you
don't know its a bug until you find it) are never wasted. Even if you
don't find the problem. The only time such an effort is wasted is when
all the effort is directed at (say) software, and you find the cleaning
lady was unplugging your disc array every night.

One day Greg may be looking at yet another stack trace without enough
evidence to locate the problem, but something about the stack traces
from Josef and others may click (light bulb appears over Greg's head)
and provide the final clue.

May be related, may not, but last month I found my new Promise ATA-100
card did not like to share interupts with my other toys. Had to be
stressed to fail. And failed in the stangest ways with xterms
disappearing, sometimes tcsh core dumped. About 15 minutes later the
whole system rolled over on its back. How robust is PCI IRQ sharing? I 
would never have expected such a problem. But I moved that card to an 
IRQ of its own and was never able to repeat the problem.

Another point I'd like to make is that its a myth that "hardware" is 
immune to such problems. "Hardware RAID" isn't really hardware so much 
as its a dedicated embedded CPU. Limited tasks to perform with total 
control of the environment. By comparison the FreeBSD kernel has a lot 
of things going on.

Anyway, if "hardware" were immune to such problems then there would be
little need for FLASH memory, BIOS updates, HD firmware updates, or even
RAID firmware updates.

Then again the advantage of the embedded system is the limited scope of 
outside influences, which should result in a stable and reliable system 
easier than one with higher limits.


--
David Kelly N4HHE, dkelly@hiwaay.net
=====================================================================
The human mind ordinarily operates at only ten percent of its
capacity -- the rest is overhead for the operating system.




To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-stable" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200101030403.f0343wp03856>