From owner-freebsd-stable Fri Apr 6 16:17:49 2001 Delivered-To: freebsd-stable@freebsd.org Received: from earth.backplane.com (earth-nat-cw.backplane.com [208.161.114.67]) by hub.freebsd.org (Postfix) with ESMTP id BF26537B423 for ; Fri, 6 Apr 2001 16:17:46 -0700 (PDT) (envelope-from dillon@earth.backplane.com) Received: (from dillon@localhost) by earth.backplane.com (8.11.2/8.9.3) id f36NHJW48955; Fri, 6 Apr 2001 16:17:19 -0700 (PDT) (envelope-from dillon) Date: Fri, 6 Apr 2001 16:17:19 -0700 (PDT) From: Matt Dillon Message-Id: <200104062317.f36NHJW48955@earth.backplane.com> To: Benjamin Flom Cc: freebsd-stable@FreeBSD.ORG Subject: Re: Cluster Solution for FreeBSD References: <3ACE4173.6020708@nexgen.com> Sender: owner-freebsd-stable@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG :We are looking to setup a failover option and a load balancing otion :using the servers that are in place. At this point we have loaded :identical configurations on 3 machines. If one of the machines gets :overloaded, we would like to off some of the processing or connection :handling to one of the others. If one of the machines goes down we would :like to have the broken box fail over, and have the other machines pick :up the load. Ideally, we could add machines to and remove machines from :the cluster as needed. The ability to maintain this configuration over a :WAN would be of value as well. Is there any known way to do this, any :direction we should follow, or is it just a pipe dream? The quick and dirty thing to do is simply setup a DNS round robin for the domain name used to access the servers. For example, if you are serving a web site called www.flubber.com you would setup the DNS for www.flubber.com to return several IP addresses (multiple IN A records) instead of just one. There isn't much point having several servers available if all the traffic is only going to one of them, and the random distribution the round robin gives you is usually sufficient to distribute the load enough that you don't really need sophisticated load balancing software. That leaves just dealing with downed servers. There are several solutions, but what it comes down to is that no matter what you do something is going to glitch when a server goes down and the real question is "how long" before that glitch clears. My take on the situation is that since there is no way to avoid the glitch (even with something like a Cisco redirector), using a DNS-based solution and short record timeouts is the least intrusive. The site might glitch for a few minutes when something goes down, but it will still correct itself quickly enough that in the day-to-day running of most businesses (e.g. anything except a brokerage site, say), nobody is going to care. It depends on what you are doing, of course. Some sites require much more stringent controls. Run the numbers and determine if you care. e.g. say you have 3 servers and a server crashes on average once every 60 days, glitching the network for 10 minutes. So once every 60 days 1/3 of your *active* users at that moment will be inconvenienced for 10 minutes. For most businesses, that isn't a problem. I did something similar at BEST Internet, though in that case the user base was split across the shell machines without any redundancy. A shell machine would ocassionally crash, inconveniencing 1/20 of the active users for however long it took us to fix it (usually it rebooted and was up 5 minutes later). Tech support calls dropped to zero. Problem solved. -Matt To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-stable" in the body of the message