From owner-freebsd-database  Thu Mar 12 16:06:37 1998
Return-Path: <owner-freebsd-database@FreeBSD.ORG>
Received: (from majordom@localhost)
          by hub.freebsd.org (8.8.8/8.8.8) id QAA03623
          for freebsd-database-outgoing; Thu, 12 Mar 1998 16:06:37 -0800 (PST)
          (envelope-from owner-freebsd-database@FreeBSD.ORG)
Received: from sendero.simon-shapiro.org (sendero-fddi.Simon-Shapiro.ORG [206.190.148.2])
          by hub.freebsd.org (8.8.8/8.8.8) with SMTP id QAA03616
          for <freebsd-database@FreeBSD.ORG>; Thu, 12 Mar 1998 16:06:27 -0800 (PST)
          (envelope-from shimon@sendero-fxp0.simon-shapiro.org)
Received: (qmail 19261 invoked by uid 1000); 12 Mar 1998 20:14:09 -0000
Message-ID: <XFMail.980312121409.shimon@simon-shapiro.org>
X-Mailer: XFMail 1.3-alpha-030698 [p0] on FreeBSD
X-Priority: 3 (Normal)
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 8bit
MIME-Version: 1.0
In-Reply-To: <19980312174747.57249@follo.net>
Date: Thu, 12 Mar 1998 12:14:09 -0800 (PST)
Reply-To: shimon@simon-shapiro.org
Organization: The Simon Shapiro Foundation
From: Simon Shapiro <shimon@simon-shapiro.org>
To: Eivind Eklund <eivind@yes.no>
Subject: Re: Fault tolerance issues
Cc: freebsd-database@FreeBSD.ORG, "Robert A.Bruce" <rab@cdrom.com>
Sender: owner-freebsd-database@FreeBSD.ORG
Precedence: bulk


On 12-Mar-98 Eivind Eklund wrote:
> On Thu, Mar 12, 1998 at 08:16:47AM -0800, Simon Shapiro wrote:
>> 2.  High Availability.  AKA HAS, High Availability Server.  A set of
>>     features that allow a computer to continue and provide service with
>>     no
>>     loss of data and only a brief interruption of service, in the fase
>>     of a
>>     single failure.
>> 
>> HAS are typically said to be SPOF (Single Point Of Failure) free.  They
>> are
>> designed to have the ability to tolerate any single component falure.
> 
> I think that definition is fairly useless.  Define city as single
> component, hit city with atomic bomb - booom, failure.

Not at all.  A Atom bomb tends to damage more than one component in the
system :-)  You have to separate operational availability from disaster
recovery.  For most FreeBSD users, a Nuclear bomb will singnal the end of
their interest in the system, or its data.  A phone company may want to
survive a conventional weapons attack by having a second database mirrored
in another facility.

iregardless of that issue, each facility still wants to have resistance to
single component failure.  It is pretty well agreed in the industry what
non-SPOF means.

> OTOH, cvsup is to some degree SPOF-free - if any single continent is
> wiped out, my changes to FreeBSD will still persist :-) Higher
> Availability than that is probably only of academic interest.

Cvsup is not a general purpose computer system.  It is not even a general
prpose database.  It is a concept implemented as an application.

This leads to the first ``Y'' in the road:

Application Level vs. System Level HAS.

Application level HAS is a very viable solution, no doubt, but it suffers
from some drawbacks:

a.  The mechanisms are not directly usable by other applications;  The way
    and means by which cvsup distributes and protects the data has no use
    for airline ticket reservation system.  Only the human understanding of
    some of the issues may be trasferrable.

b.  Typically, poor atomicity/resolution palgues such solutions.  Cvsup is
    an exception, but the cvsup model is totally unacceptable for OLTP type
    work.

c.  Very long and unreliable checkpoint/restart delays.  Again, for cvsup,
    it matters not.  For an on-line ordering/credit-card processing systm,
    it may not be acceptable.

> I actually think it would be better to talk about a system having
> features for High Availability than talking about it 'being HA'.

I tend to disagree here too.  These are two separate, but equally valid
discussions.

a.  What characterizes a High Availability Server (as opposed to a Fault
    Tolerant one)

b.  What features are there to implement a HAS.  Some of these features can
    be adopted by NSHAS to various degrees.

> Some features:
>       Redundancy - anything that can fail with an interesting
>       probability (and remember, you often have many deployed
>       systems) should have solutions that can automatically take
>       over functionality.

This directly contradicts your Atomic bomb statement from above.  Besides,
this is an implementation feature.  Redundency is not part of high
availability definition.  It is one part of an implementation solution.
Maybe the only one we can think of at this moment.

>       Non-interference - anything that is temporarily disconnected
>       will not interfere with the part that take over.

This is part of SPOF.  If a disk drive fails and takes the SCSI bus with
it, we now have TWO failed components;  A disk and a bus.  But, you are
right in this requirement.

>       Quick switchover - switchover to backup solutions should
>       happen automatically and quickly once an error is detected.

Not necessarily.  What if I do not need to switch over at all?  If our
solution to HA is to switch over, then it needs to be quick, automatic and
transparent.  But what (how long) is quick?

>       Error detection - errors should be detected quickly,
>       automatically, and consistently.

We need this one above the other one.  First we detect the error, then we
decide to switch over :-)

>       Quick restart - if something goes down, it should not need a
>       long time to restart.  fsck is right out.

We do not need quich re-start if we have quick (or no) switchover.  If a
filesystem can continue to be accessed, in the face of a system crash, then
who cares how long it takes the system to re-boot?

We care, as in most HAS, we will be in a degraded mode until repairs are
perfoprmed.  Degraded mode means that we continue processing probably at a
reduced rate), but the next failure will cause disruption of service.

Fsck assumes Unix filesystems.  I am still at a lower and perhaps broader
level.  Think about raw didsk, tapes, networking, etc.

> At a theoretical level, I think these are probably the main points.
> We can then start discussing what interesting subsystems FreeBSD has,
> and what can be done to provide these features for each of the
> subsystems.
> 
>> If there is interest, we can start a discussion on what such a computer
>> looks like.
> 
> That could be interesting, though if we really want this to be
> fruitful we should (at some point not too far into the future) start
> focusing on making at least TODO-list, and probably a few designs.
> 
> Eivind.

Absolutely agree.  the only exception I take here is that we may want to
define the service level, the interruption modes, etc., before we think
soluitions.  For example, I do not want to assume Unix Filesystem of any
kind when I think data storage.  You may belive that UFS is just fine.
I may think contineous service, you may think resstart.  We have to define
the services we want to support, their level of ``reliability'', failure
modes, etc.  Then we come up with a TODO list, then we do it.

My bias towards this is to push the HA as far down the stack as I can
reasonably get away with.  I want to be able to drop as much ``off the
shelf'' stuff on top of it.  Give you an example;  It took the ufs
filesystem almost 1/4 of a century to stabilize.  It still goes through
girations (soft updates) and still incapable of surviving a software crash
(panic) with absolute certainty of 100% instant recovery.  Veritas can
pretty much deliver that, but not UFS.  UFS is totally incapable of
recovering from ANY hardware failures.  Veritas offers a facility that can
somewhat survice hardware failures.

My point?  If I can ``give'' FreeBSD a reliable ``disk'' that looks,
tastes, smells, and sounds like a ``normal'' disk, and this ``disk''
guarantees: a) no data loss with a single component failure, and b)
transparent and continual availability with one Unix instance loss, I can
put ANY disk access method (not only a filesystem) on it, and this method
will automagically be reliable, non-lossy, resilinet and highly available.

----------


Sincerely Yours, 

Simon Shapiro
Shimon@Simon-Shapiro.ORG                      Voice:   503.799.2313

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-database" in the body of the message