From owner-freebsd-hackers Tue Mar 10 11:27:06 1998 Return-Path: Received: (from majordom@localhost) by hub.freebsd.org (8.8.8/8.8.8) id LAA04050 for freebsd-hackers-outgoing; Tue, 10 Mar 1998 11:27:06 -0800 (PST) (envelope-from owner-freebsd-hackers@FreeBSD.ORG) Received: from smtp01.primenet.com (smtp01.primenet.com [206.165.6.131]) by hub.freebsd.org (8.8.8/8.8.8) with ESMTP id LAA03986 for ; Tue, 10 Mar 1998 11:26:53 -0800 (PST) (envelope-from tlambert@usr01.primenet.com) Received: (from daemon@localhost) by smtp01.primenet.com (8.8.8/8.8.8) id MAA26938; Tue, 10 Mar 1998 12:26:45 -0700 (MST) Received: from usr01.primenet.com(206.165.6.201) via SMTP by smtp01.primenet.com, id smtpd026895; Tue Mar 10 12:26:42 1998 Received: (from tlambert@localhost) by usr01.primenet.com (8.8.5/8.8.5) id MAA06848; Tue, 10 Mar 1998 12:26:38 -0700 (MST) From: Terry Lambert Message-Id: <199803101926.MAA06848@usr01.primenet.com> Subject: Re: Fault tolerance issues To: shimon@simon-shapiro.org Date: Tue, 10 Mar 1998 19:26:38 +0000 (GMT) Cc: tlambert@primenet.com, hackers@FreeBSD.ORG In-Reply-To: from "Simon Shapiro" at Mar 9, 98 10:52:10 pm X-Mailer: ELM [version 2.4 PL25] MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: owner-freebsd-hackers@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG > > If you shutdown normally, or panic, the uncommited writes get flushed > > from the disk cache by the disk (because it doesn't know from reset > > in the time it takes to panic or crash). > > There are hardware features in place today to take care of this problem > during normal shutdown. Panics are no-man's land. Ugh. "FreeBSD is tolerant of faults, as long as they never happen"? A panic is exactly the type of fault you want to CYA against. [ ...soft read-only... ] > I always wondered why this is not so. Not even after sync(2). With the old sync process (updated, not syncer), it wasn't very cost effective. It would happen on every sync. > I do not want to use hardware at all :-) The best computer is a cuamber in > a shoe box; Simple, no moving parts, easy to replace, easy to tell if bad > and good for the diet. > > Seriously, hardware can simplify life a lot. It is a matter of cost. Most > people will accept that their $3,000 computer has to be protected by a $75 > UPS. That's true. But if they don't have a UPS, then you shouldn't sell them you "Fault Tolerant FreeBSD". The difference is SFT (*Software* Fault Tolerance); that's why Novell is still making money in the server market (or at least one of the reasons). > You can advance this idea only so far. And for what cost? Software has > bugs. What is the uptime ratio between these software modifications > described here and a UPS? Even if YOU write ALL this code :-) If I have to write all the code myself, it'll be a long time before it gets done. But If I'm serious, I'll write it in vanilla K&R so I can run the C++ branch path analysis tool from the comp.unix.sources archives on it. One component that's being overlooked here is QA as opposed to QC. It may be that true SFT can't ever happen in a free software project, due to the project management constraints needed to produce really reliable code. > I think that software solutions to problems like this are great, as long as > they still allow for hardware solutions to augment/complement them. I would definitely like to see someone produce a PrestoServ card for FreeBSD (for example). This would get it into a hell of a lot of traditionally "big iron" shops. The whole fault tolerance issue is "how do I make small iron look like big iron without getting Tony Overfield to redesign the PC?". 8-). Terry Lambert terry@lambert.org --- Any opinions in this posting are my own and not those of my present or previous employers. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-hackers" in the body of the message