From owner-freebsd-questions  Mon May 20 18:48:49 1996
Return-Path: owner-questions
Received: (from root@localhost)
          by freefall.freebsd.org (8.7.3/8.7.3) id SAA10623
          for questions-outgoing; Mon, 20 May 1996 18:48:49 -0700 (PDT)
Received: from mistery.mcafee.com (jimd@mistery.mcafee.com [192.187.128.69])
          by freefall.freebsd.org (8.7.3/8.7.3) with SMTP id SAA10614;
          Mon, 20 May 1996 18:48:46 -0700 (PDT)
Received: (from jimd@localhost) by mistery.mcafee.com (8.6.11/8.6.9) id TAA06598; Mon, 20 May 1996 19:01:21 -0700
From: Jim Dennis <jimd@mistery.mcafee.com>
Message-Id: <199605210201.TAA06598@mistery.mcafee.com>
Subject: Re: <no subject>
To: gpalmer@freebsd.org (Gary Palmer)
Date: Mon, 20 May 1996 19:01:21 -0700 (PDT)
Cc: msmith@atrad.adelaide.edu.au, slagos@net1plus.com, questions@freebsd.org
In-Reply-To: <20527.832557068@palmer.demon.co.uk> from "Gary Palmer" at May 20, 96 02:51:08 am
X-Mailer: ELM [version 2.4 PL24]
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Sender: owner-questions@freebsd.org
X-Loop: FreeBSD.org
Precedence: bulk

> 
> Michael Smith wrote in message ID
> <199605200139.LAA20337@genesis.atrad.adelaide.edu.au>:
> > Scott A. Lagos stands accused of saying:
> > > How many IP addresses can be supported on a single Ethernet card when 
> > > running under the latest version of FreeBSD.
> > Thousands.  I got bored at about the 5,000 mark, but nothing popped up to 
> > stop me. (20 class C aliases).
> *ONLY* 5000? Anyone got a registered class B or A address space they
> want to volunteer for testing? :-) (just kidding)
> 
> Gary

	Speaking of IP aliasing (again) I was thinking....

	I currently have two FreeBSD boxes mirrored as ftp sites.
	I'm using DNS round-robin to balance the load.  The question 
	has come up:  "What happens if one of them goes down?"

	I was thinking I could use IP aliases to add some fault tolerance
	to the soup.  Basically if a machine fails now then every other
	request to start an ftp session will fail.  This is better than
	having them *all* fail -- but not much better from the tech support
	rep's point of view.

	What I was thinking would work something like this:

		BIND the first IP address (as I'm doing already) to 
		the primary interface on each machine. (Call these the
		ftp/rr addresses)

		BIND a second IP address (alias) to the ether interface on each
		machine. (Call these the "control" addresses)

		Write a script that polls the ftp/rr address on each of the 
		other machines in the ring periodically.  If the poll fails 
		(past a set threshold) then IP alias the failed machine's port
		to the local ether's interface.  

		Write another script and add it to the rc.local of each
		of these machines.  This script does an rsh (or a custom 
		finger or whatever) to the "control" address on the 
		machine that's aliased "my" primary address (I have to 
		bring up my "control" IP to talk at all but I can send
		my "I'm back up message" to the "my ftp/rr address").

		When the machine that took over for a failed member of the
		round robin team get this remote message (rsh or custom 
		finger) it drops the alias (for the failed system).

		Meanwhile the system that had failed, and is now back on
		line polls "its" ftp/rr address until it fails -- then 
		it can finally BIND it back.

		There might be some race conditions if I set this up for 
		more than two machines.  I could add some additional
		checks for that or I could just limit myself to using 
		machine pairs.  I don't expect to have to scale past 
		two ftp hosts (currently 400 ftp connections each and 
		probably 800 - 1000 with the next RAM upgrade in each)
		and two www hosts (currently only one -- a Sparc 20 that's
		massively *underloaded*) for at least six to nine months.

	The idea is that machines in a round robin ring could 
	"cover" for one another using simple scripts and IP aliasing.
	There's no kernels hacks, no dynamic DNS hacks, and no 
	applications layer hacks necessary.
 
	I would naturally lose any connections during the initial
	failure -- and I'd probably kill any connections that had been
	established before the "dead" system got resurrected.  I might
	have to add logic to have the "dead" system check it's own
	messages log and leave itself out of the ring if it rebooted
	more than three times in an hour or something (I could also use
	some sort of control file to keep a count).

	This all seems too easy -- yet I haven't heard of anyone using
	it.  Of course I also haven't heard of any sites using 
	round robin DNS either -- and that was easy, too.

	So... what am I missing?

Jim Dennis,
System Administrator,
McAfee Associates