FreeBSD Mail Archives

Date:      Sat, 6 Mar 1999 07:58:52 -0500 (EST)
From:      Christopher M Sedore <cmsedore@maxwell.syr.edu>
To:        Luigi Rizzo <luigi@labinfo.iet.unipi.it>
Cc:        freebsd-net@FreeBSD.ORG
Subject:   Re: IP source address based load balancing
Message-ID:  <Pine.BSF.4.05.9903060726360.8329-100000@30.maxwell.syr.edu>
In-Reply-To: <199903060502.GAA13150@labinfo.iet.unipi.it>

On Sat, 6 Mar 1999, Luigi Rizzo wrote:

> > I've implemented a basic form of load balancing for IP based services.
> > It works by assigning the same IP address to multiple machines, having
> > all the machines receive the IP packets for that address but only pass
> > the ones from specific sources up to the stack.  The notion is that you
> ...
> 
> nice work, but as you probably note later, a true balancing mechanism
> should be able to handle failures and addition of more servers in
> a more flexible way.

Yes.  I'm hoping that I'll be able to write a "cluster" daemon that will
monitor the machines and reconfigure on failure or addition.  A broadcast
heartbeat would allow failure detection.  This is somewhat more complex
that I had originally hoped, since the operations would ideally not need a
"master" machine, and would be all distributed.  This means that all
machines in a cluster need to somehow negotiate a mutually agreed upon
configuration through broadcasting to each other.  I have some
not-too-developed ideas on how to do this, but there are a number of
failure modes, etc that have to be considered.

> alsowith your setting i see a difficulty in accessing the various
> machines for management purposes (unless you set up an additional,
> different IP address for each one).

At present, I use an alias address on the interface for the "virtual"
address.  You can use the normal address for maint, etc and services would
be accessed through the virtual.  This is handy because a machine could be
a member of multiple clusters (for apache virtual hosts, or smtp or
whatever). 

> a suggestion: apart from ARP handling (which is somewhat complex) why
> don't you use the ipfw rules to decide what addresses each machine
> should respond to ? This way you could use more sophisticated
> allocations, etc. and especially have most things outside the kernel.

I thought about doing it this way and didn't mostly because most of the
modifications required the requires modifications in all the same places.
ipfw may be the way to do it, I just haven't looked that far in.  One
concern would be performance since a direct modification of ip_input only
adds 2 comparisons and an AND per packet destined for the virtual
interface, where ipfw needs more.  That said, I use ipfw on my router
which pushes 2000 pps on 2 interfaces and has quite a number of ipfw
rules, so it may not matter too much.  The allocation piece would be nice,
because that way you could have certain ports distributed and certain ones
not.

> As for ARP handling, could you tell a bit more on how you solved the
> problem ? e.g. in the case of connections coming from the outside, your
> router will only have one entry for the IP address you use for your
> cluster, so it will not be able to talk to the individual machines
> unless you set all of them to use the same ethernet address (which is
> also a reasonable thing to do, probably).

Well I basically fixed the ARP code two ways.  First, I changed the check
that looks for others using our IP--if IP that was being ARPed belongs to
the "virtual" IP, it bypasses that check.  Second, when responding, I had
it lookup in the arp table instead of using the local interface it found.
So you just add an arp entry to your table (with the pub directive) and it
uses that to answer queries:

arp fxp0 128.230.143.88 01:00:5e:10:00:00 pub

for example.  Note that is a multicast address, which leads me to how I
solved the MAC problem.  When you set a filter for an alias address it
calls if_allmulti for the associated interface so it hears all multicasts.
This seems to work fine with everything I tried, except NT5 beta 2, which
won't accept a multicast address in an ARP response.  ip_input had to be
modified to drop the M_MCAST flag of the packets so that upper layers
would accept them.  One could use promiscuous mode instead of multicast,
which would solve some of the problems, and not have too much performance
impact if the net was dedicated to the cluster and routed.

When all is said and done, this is probably less than 20 lines of code in
kernel space, across 5 or 6 files.  I am thinking that ipfw is the way to
do it, but its not going to eliminate much of the code already, only add
flexibility.  It'd be cool to distribute some services but not others, or
distribute some services across 2 machines, some across those 2 plus one,
etc etc.

-Chris

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-net" in the body of the message

Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.BSF.4.05.9903060726360.8329-100000>

Header And Logo

Peripheral Links

Site Navigation

Header And Logo

Peripheral Links

Search

Site Navigation