From owner-freebsd-stable  Fri Apr  6 16:17:49 2001
Delivered-To: freebsd-stable@freebsd.org
Received: from earth.backplane.com (earth-nat-cw.backplane.com [208.161.114.67])
	by hub.freebsd.org (Postfix) with ESMTP id BF26537B423
	for <freebsd-stable@FreeBSD.ORG>; Fri,  6 Apr 2001 16:17:46 -0700 (PDT)
	(envelope-from dillon@earth.backplane.com)
Received: (from dillon@localhost)
	by earth.backplane.com (8.11.2/8.9.3) id f36NHJW48955;
	Fri, 6 Apr 2001 16:17:19 -0700 (PDT)
	(envelope-from dillon)
Date: Fri, 6 Apr 2001 16:17:19 -0700 (PDT)
From: Matt Dillon <dillon@earth.backplane.com>
Message-Id: <200104062317.f36NHJW48955@earth.backplane.com>
To: Benjamin Flom <benf@nexgen.com>
Cc: freebsd-stable@FreeBSD.ORG
Subject: Re: Cluster Solution for FreeBSD
References:  <3ACE4173.6020708@nexgen.com>
Sender: owner-freebsd-stable@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

:We are looking to setup a failover option and a load balancing otion 
:using the servers that are in place. At this point we have loaded 
:identical configurations on 3 machines. If one of the machines gets 
:overloaded, we would like to off some of the processing or connection 
:handling to one of the others. If one of the machines goes down we would 
:like to have the broken box fail over, and have the other machines pick 
:up the load. Ideally, we could add machines to and remove machines from 
:the cluster as needed. The ability to maintain this configuration over a 
:WAN would be of value as well. Is there any known way to do this, any 
:direction we should follow, or is it just a pipe dream?

    The quick and dirty thing to do is simply setup a DNS round robin
    for the domain name used to access the servers.  For example, if
    you are serving a web site called www.flubber.com you would
    setup the DNS for www.flubber.com to return several IP addresses
    (multiple IN A records) instead of just one.  There isn't much point
    having several servers available if all the traffic is only going to
    one of them, and the random distribution the round robin gives you
    is usually sufficient to distribute the load enough that you don't
    really need sophisticated load balancing software.

    That leaves just dealing with downed servers.  There are several
    solutions, but what it comes down to is that no matter what you do
    something is going to glitch when a server goes down and the real
    question is "how long" before that glitch clears.  My take on the
    situation is that since there is no way to avoid the glitch (even with
    something like a Cisco redirector), using a DNS-based solution and
    short record timeouts is the least intrusive.  The site might glitch
    for a few minutes when something goes down, but it will still correct
    itself quickly enough that in the day-to-day running of most businesses
    (e.g. anything except a brokerage site, say), nobody is going to care.
    It depends on what you are doing, of course.  Some sites require much
    more stringent controls.

    Run the numbers and determine if you care.  e.g. say you have 3 servers
    and a server crashes on average once every 60 days, glitching the 
    network for 10 minutes.  So once every 60 days 1/3 of your *active*
    users at that moment will be inconvenienced for 10 minutes.  For
    most businesses, that isn't a problem.  I did something similar at
    BEST Internet, though in that case the user base was split across
    the shell machines without any redundancy.  A shell machine would
    ocassionally crash, inconveniencing 1/20 of the active users for
    however long it took us to fix it (usually it rebooted and was up 5
    minutes later).  Tech support calls dropped to zero.  Problem solved.

						-Matt


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-stable" in the body of the message