From owner-freebsd-hackers  Tue Nov  7  6:51: 5 2000
Delivered-To: freebsd-hackers@freebsd.org
Received: from neimail.networkengines.com (unknown [64.55.6.7])
	by hub.freebsd.org (Postfix) with ESMTP
	id E238537B4C5; Tue,  7 Nov 2000 06:50:55 -0800 (PST)
Received: by neimail.networkengines.com with Internet Mail Service (5.5.2650.21)
	id <W13QF2T7>; Tue, 7 Nov 2000 09:48:08 -0500
Message-ID: <8D18C4F9CBA1D311900F00A0C990C97F67CB4A@neimail.networkengines.com>
From: Andrew Sporner <andy.sporner@networkengines.com>
To: "'Michael C . Wu'" <keichii@peorth.iteration.net>,
	Andrew Sporner <andy.sporner@networkengines.com>
Cc: freebsd-net@freebsd.org,
	"'freebsd-hackers@freebsd.org'" <freebsd-hackers@freebsd.org>
Subject: RE: High-availability failover software available.
Date: Tue, 7 Nov 2000 09:48:07 -0500 
MIME-Version: 1.0
X-Mailer: Internet Mail Service (5.5.2650.21)
Content-Type: text/plain;
	charset="iso-8859-1"
Sender: owner-freebsd-hackers@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

Hi!

> | a H/A Failover system that happens to work with BSD.  I 
> would like to
> | contribute
> | this to the FreeBSD project or at a minimum make it 
> available to those who
> | What it includes is:
> |       -  Multi-path heartbeat based node failure detection.
> 
> How do you determine that a machine/service is "failing"?
> Have you considered that the distributed nodes might be very far
> away from each other, even across the globe?  Lag time can
> lead to daemon falsely thinking that a node is down.

This is for local area clusters.  This is really foundation work
for another project I have in mind, involving process swapping.
I needed a way of finding when a node died to do some garbage
collection.  As it turned out, it didn't take much to make it
work for doing application failover.  So I thought, "Well since
there doesn't appear to be anything free this way--why not!? and
so here it is."  In short I am not trying to solve geographical
failover because it isn't really germaine to my goal.  But, if
the architecture lends itself to it, there is no reason not to
include it.  Any takers?

But to answer your question.  A heart beat message is sent on
the broadcast address of the networks that the machine lives on.
Currently it will use all of them.  I think a necessary feature
would be to allow for a subset of interfaces. :-)  But I wanted
to solve the bigger problems first.  When a peer node recieves
the packet, it updates a timestamp for it's peer interface.  
Periodically these are checked and if one of the links is out
of bounds, it is marked offline.  When the last live interface
is marked offline, the node is marked offline and then the 
recovery procedure starts.  There may be a need to start recovery
if only one interface fails, but again this is outside the 
mission I have embarked on.  Mainly because the architecture
I am pursing requires that the heartbeat lans (which will also
be used to transfer pages between machines) be private and not
used for user applications.   In this way I get link recovery
through redundancy and the cluster software knows how to handle
it--where an application might (actually probably) won't know
how to handle it.  But to stop a potential argument--let's leave
it strictly the case of keeping these particular lans private.

> 
> Reading your code, I don't think broadcasting over all 
> interfaces is a good idea.  A safer way would be requiring two 
> physical interfaces on all nodes, building two seperate physical 
> networks.  The nodes communicate information on one network, and 
> do the actual "work" on the other, more powerful, network.  The 
> daemons on the nodes should broadcast over these two interfaces.  
> In addition, a comparison between the connectivity of the two physical 
> networks can provide lots of valuable info.

I agree!  I am answering this email serially and am reading as
I reply and it looks like we are in sync.  I will in the next
few releases privide a delete option on the GUI to take away
an autodiscovered interface.  I think the autodiscovery is 
important, but it should be allowed to be updated.  The interface
will remain in the configuration (so it doesn't get autodiscovered
again! :-)) but marked with a special flag so it isn't used.  An
upcoming feature is the ability to drag a lan interface over to
the right of the GUI to monitor LAN perforamce.  I also plan to
have several metrics tied to each resource and by right-clicking
on the guages that are there you can change them.  Right now 
The applications or nodes can be monitored and it is only CPU,
MEMORY and one other--been too long since I saw it.  At one point
I even thougth about putting TOP functionality in the GUI so that
by expanding a node, one can also see the processes and drag a
process over to the right side to monitor it.

> A great thing to do with this code would be using kqueue.

Can you give me more specifics?  or better--would you be willing
to try it and give me the patch?  

> 
> I think there should be a daemon that "routes" service queries, say
> a http request, to different nodes as the requests come in.

Like a load balancer? :-)  I had one once and unfortunately two
things.  First it won't scale (reverse proxy) and second some people
I worked on this for would have a fit. 

I went to a great mini-tutorial at BSDcon about IP filter.  Guido
mentioned something about having a kernel filter rule that calls
a kernel address for each packet.  A good way to do this would be
to put a router inside the kernel and then leverage the IP filter.

> 
> |       -  Drag&Drop administration interface (X-11 tcl/tk based
> 
> I think this should also include a console-type controller.
> Real-life work involves admining remotely, and GUI apps are not
> that great remotely.

:-)  OK, next release! :-)  

> 
> | The current state is alpha and is being tested by several 
> people now.  Beta
> | The current source is located at http://www.sporner.com/bsdclusters
> 
> Please document what you have done, so we can learn more about
> the engineering thinking behind your implementation.

In progress, maybe some of my answers here would help.  I also
have heavily commented the cluster.c code.  But I will write a
document detailing how and why in the near future.

> Will this be BSDL or will it be another license? Also, please
> include man pages.

Yes, and yes :-)

> I think we missed a great chance to talk to each other, since my
> research interest is the same field as your project. Or we may have
> met, I'm the Chinese guy that had the shortest hair.
> 
> P.S. Please keep lines shorter than 80 characters. :)

Ahhh!  A terminal guy :-)  I will try to, but since I kind
of whored myself to microsoft for email it is hard to think
about this :-)

Thanks for all of your feedback!



Andy


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message