Date: Fri, 6 Apr 2001 16:17:19 -0700 (PDT) From: Matt Dillon <dillon@earth.backplane.com> To: Benjamin Flom <benf@nexgen.com> Cc: freebsd-stable@FreeBSD.ORG Subject: Re: Cluster Solution for FreeBSD Message-ID: <200104062317.f36NHJW48955@earth.backplane.com> References: <3ACE4173.6020708@nexgen.com>
next in thread | previous in thread | raw e-mail | index | archive | help
:We are looking to setup a failover option and a load balancing otion :using the servers that are in place. At this point we have loaded :identical configurations on 3 machines. If one of the machines gets :overloaded, we would like to off some of the processing or connection :handling to one of the others. If one of the machines goes down we would :like to have the broken box fail over, and have the other machines pick :up the load. Ideally, we could add machines to and remove machines from :the cluster as needed. The ability to maintain this configuration over a :WAN would be of value as well. Is there any known way to do this, any :direction we should follow, or is it just a pipe dream? The quick and dirty thing to do is simply setup a DNS round robin for the domain name used to access the servers. For example, if you are serving a web site called www.flubber.com you would setup the DNS for www.flubber.com to return several IP addresses (multiple IN A records) instead of just one. There isn't much point having several servers available if all the traffic is only going to one of them, and the random distribution the round robin gives you is usually sufficient to distribute the load enough that you don't really need sophisticated load balancing software. That leaves just dealing with downed servers. There are several solutions, but what it comes down to is that no matter what you do something is going to glitch when a server goes down and the real question is "how long" before that glitch clears. My take on the situation is that since there is no way to avoid the glitch (even with something like a Cisco redirector), using a DNS-based solution and short record timeouts is the least intrusive. The site might glitch for a few minutes when something goes down, but it will still correct itself quickly enough that in the day-to-day running of most businesses (e.g. anything except a brokerage site, say), nobody is going to care. It depends on what you are doing, of course. Some sites require much more stringent controls. Run the numbers and determine if you care. e.g. say you have 3 servers and a server crashes on average once every 60 days, glitching the network for 10 minutes. So once every 60 days 1/3 of your *active* users at that moment will be inconvenienced for 10 minutes. For most businesses, that isn't a problem. I did something similar at BEST Internet, though in that case the user base was split across the shell machines without any redundancy. A shell machine would ocassionally crash, inconveniencing 1/20 of the active users for however long it took us to fix it (usually it rebooted and was up 5 minutes later). Tech support calls dropped to zero. Problem solved. -Matt To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-stable" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200104062317.f36NHJW48955>