Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 10 Mar 1998 12:06:48 -0800 (PST)
From:      Simon Shapiro <shimon@simon-shapiro.org>
To:        Terry Lambert <tlambert@primenet.com>
Cc:        hackers@FreeBSD.ORG
Subject:   Re: Fault tolerance issues
Message-ID:  <XFMail.980310120648.shimon@simon-shapiro.org>
In-Reply-To: <199803101926.MAA06848@usr01.primenet.com>

next in thread | previous in thread | raw e-mail | index | archive | help

On 10-Mar-98 Terry Lambert wrote:
 
...

> Ugh.  "FreeBSD is tolerant of faults, as long as they never happen"?
> 
> A panic is exactly the type of fault you want to CYA against.

I think  we are talking two classes of manic:

*  Pedictable/Manageble:  Where a specific piece of code knowingly decides
   to shutdown.  Here, you are right.

*  Unpredictable: Such things as NULL pointers, invalid addresses, etc.  I
   have never seen a Unix (FreeBSD included) that can get from undeneath
   this class of panics.  Typically, the stack goes with the panic.  Any
   attempt to continue and run ANY code is doomed to fail.

In an ideal world, you are correct.  In my world, I consider all panics a
catastrophic failure and switch over ot alternate compute engine.

...

>> I always wondered why this is not so.  Not even after sync(2).
> 
> With the old sync process (updated, not syncer), it wasn't very
> cost effective.  It would happen on every sync.

``Cost Effective'' in what way?  Losing a critical file, or corrupting an
on-line database is a lot less effective than n% loss of speed.  N can be
pretty large here, if you ask users who are in the know.

Again, a switch will be the best solution.  Dia in the level of security or
reliability you desire.

 ...

> That's true.  But if they don't have a UPS, then you shouldn't sell
> them you "Fault Tolerant FreeBSD".
> 
> The difference is SFT (*Software* Fault Tolerance); that's why Novell
> is still making money in the server market (or at least one of the
> reasons).

How many Novell servers have you seen without a UPS behind them?
Again, MHO is that software should protect against abrupt termination as
well as it can.  But, it is OK to clearly define the constraints, and say
``For this I need at least n seconds of continued processing time''.

>> You can advance this idea only so far.  And for what cost?  Software has
>> bugs.  What is the uptime ratio between these software modifications
>> described here and a UPS?  Even if YOU write ALL this code :-)
> 
> If I have to write all the code myself, it'll be a long time before
> it gets done.  But If I'm serious, I'll write it in vanilla K&R so
> I can run the C++ branch path analysis tool from the comp.unix.sources
> archives on it.

I may be blind, and behind the times, but, aside from formal prototypes, I
fail to see what really improved in the C language since K&R.

> One component that's being overlooked here is QA as opposed to QC.

QC is management measurable (almost).  QA is more of a moral issue.

> It may be that true SFT can't ever happen in a free software project,
> due to the project management constraints needed to produce really
> reliable code.

I disagree.  I have seen this project (and others) produce high quality
code.  And very reliable code.

The weakness of this environment is that, at times, we bite more than we
can chew, and that, in the FreeBSD in particular, our efforts are difused;
We work on a lot of different big things.  Instead, we should try to form
task forces which work on specific things, broadening or exeprtise level,
and ensuring maturity of features. rather than count.

...

> I would definitely like to see someone produce a PrestoServ card for
> FreeBSD (for example).  This would get it into a hell of a lot of
> traditionally "big iron" shops.

And what is a PrestoServ card?

> The whole fault tolerance issue is "how do I make small iron look like
> big iron without getting Tony Overfield to redesign the PC?".
> 
> 8-).

And who is Tony Overfield?  You are talking to some ignorant audience here
:-)

BTW, you cannot re-design the PC.  You have to discard it.


----------


Sincerely Yours, 

Simon Shapiro
Shimon@Simon-Shapiro.ORG                      Voice:   503.799.2313

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?XFMail.980310120648.shimon>