Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 20 Feb 1998 08:27:21 -0500 (EST)
From:      "John T. Farmer" <jfarmer@goldsword.com>
To:        agdolla@datanet.hu, freebsd-isp@FreeBSD.ORG
Cc:        jfarmer@goldsword.com
Subject:   Re: fault tolerant :)) setup
Message-ID:  <199802201327.IAA21923@sabre.goldsword.com>

next in thread | raw e-mail | index | archive | help

On Thu, 19 Feb 1998 19:02:53 +0100 (NFT) Gabor Dolla said:
>I'd like to hear opinions on fault-tolerant setups....
>
>Say, you have two identical machines, one is a mail server the other is
>the www server, and when one of them is down the other does both jobs.
>
>A few years back I worked for a company which had some Digital Alpha
>servers. Digital had a nice disk tower with an Y cable so both servers
>were able to access the same disks. Are there such products available for
>PCs ?

I used to design real-time factory data-acq. & control systems, so I'll
give this a go.

There are at least 5 issues to deal with:

1.	Keeping data in sync between two machines where only one machine 
	is active at a time.  For simplicity, think of it as one machine 
	running & the 2nd machine in "hot-standby."

2.	Handling incoming network traffic that is associated with a
	specific IP address.

3.	Dealing with the transaction that is disrupted when the primary
	machine fails.

4.	Monitoring the primary machine for failure, determining 
	that it has failed, and locking it out to allow the 
	secondary machine to complete or restart the transaction.

5.	Restoring the primary machine to service (a subset of startup
	conditions).

How these issues are dealt with are _highly_ dependant on the 
application(s) involved & the "transparency" required for fault 
handling.  There are also commerical systems available that use
tools like this (and others) to provide fault-recovery or prevention
(two big players in this are Tandem Computer, now part of Compaq;
and Stratus Computers).

On the smaller side, several things can be done to improve the
typical server setup.  Examples are:

  -	Raid Disk Subsystems, possibly with dual control ports.

  -	"round-robin" DNS entries to spread the load over several
	machines.  In addition, dynamic DNS updates with _very_ short
	expire times, could be used to "disable" access to a faulting
	server (at the cost of defeating DNS "cacheing").

  -	With an SNMP controllable hub/switch, the port used by a faulting
	machine can be locked out of service.

  -	Of course, any single point failure modes should be avoided or 
	removed if possible.  For example, everybody thinks about 
	redunant "hot-swap" power supplies in servers.  What about the
	power circuit to it?  Are both supplies fed from the same UPS?
	The same breaker?

There are some reasonable things that are easy to overlook but anybody
serious about reliable service can implement without a lot of expense.
However, the costs do escalate rapidly with the improvement in availablity
diminishing just a rapidly.  You might move availablity from from 90%
to 95% by spending $XXXus.  To go from 95% to 96% might require spending
10 * $XXXus.

John

-------------------------------------------------------------------------
John T. Farmer			Proprietor, GoldSword Systems
jfarmer@goldsword.com		Public Internet Access in East Tennessee
Office: (423)691-6498		for info, e-mail to info@goldsword.com
	Network Design, Internet Services & Servers, Consulting

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-isp" in the body of the message



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199802201327.IAA21923>