From owner-freebsd-stable Tue Jan 2 20: 4:26 2001 From owner-freebsd-stable@FreeBSD.ORG Tue Jan 2 20:04:23 2001 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from grumpy.dyndns.org (user-24-214-56-41.knology.net [24.214.56.41]) by hub.freebsd.org (Postfix) with ESMTP id 68A3237B400 for ; Tue, 2 Jan 2001 20:04:22 -0800 (PST) Received: from localhost (localhost [127.0.0.1]) by grumpy.dyndns.org (8.11.1/8.11.1) with ESMTP id f0343wp03856; Tue, 2 Jan 2001 22:03:59 -0600 (CST) (envelope-from dkelly@grumpy.dyndns.org) Message-Id: <200101030403.f0343wp03856@grumpy.dyndns.org> X-Mailer: exmh version 2.2 06/23/2000 with nmh-1.0.4 To: Greg Lehey Cc: Josef Karthauser , Matraquilla@cs.com, Roman Shterenzon , freebsd-stable@FreeBSD.ORG From: David Kelly Subject: Re: RAID-5 reliability (was: vinum malfunction!) In-reply-to: Message from Greg Lehey of "Wed, 03 Jan 2001 10:36:21 +1030." <20010103103621.G40453@wantadilla.lemis.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Date: Tue, 02 Jan 2001 22:03:58 -0600 Sender: dkelly@grumpy.dyndns.org Sender: owner-freebsd-stable@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG Greg Lehey writes: > On Tuesday, 2 January 2001 at 14:06:16 +0000, Josef Karthauser wrote: [...] > > > I don't know what it's going to take for Greg to get enough > > information to fix the problem. I spent a week trying to extract a > > set of debug information for him that was useful enough that he > > could work from it, but it seems that my week was just wasted > > because it looks like I didn't capture the bug that he was expecting > > :(. > > Well, it's not wasted, but it didn't help enough. As I said, it's > elusive. A point I'd like to make is that efforts to hunt down a problem (you don't know its a bug until you find it) are never wasted. Even if you don't find the problem. The only time such an effort is wasted is when all the effort is directed at (say) software, and you find the cleaning lady was unplugging your disc array every night. One day Greg may be looking at yet another stack trace without enough evidence to locate the problem, but something about the stack traces from Josef and others may click (light bulb appears over Greg's head) and provide the final clue. May be related, may not, but last month I found my new Promise ATA-100 card did not like to share interupts with my other toys. Had to be stressed to fail. And failed in the stangest ways with xterms disappearing, sometimes tcsh core dumped. About 15 minutes later the whole system rolled over on its back. How robust is PCI IRQ sharing? I would never have expected such a problem. But I moved that card to an IRQ of its own and was never able to repeat the problem. Another point I'd like to make is that its a myth that "hardware" is immune to such problems. "Hardware RAID" isn't really hardware so much as its a dedicated embedded CPU. Limited tasks to perform with total control of the environment. By comparison the FreeBSD kernel has a lot of things going on. Anyway, if "hardware" were immune to such problems then there would be little need for FLASH memory, BIOS updates, HD firmware updates, or even RAID firmware updates. Then again the advantage of the embedded system is the limited scope of outside influences, which should result in a stable and reliable system easier than one with higher limits. -- David Kelly N4HHE, dkelly@hiwaay.net ===================================================================== The human mind ordinarily operates at only ten percent of its capacity -- the rest is overhead for the operating system. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-stable" in the body of the message