From owner-freebsd-cluster  Thu Dec 12  1:50:24 2002
Delivered-To: freebsd-cluster@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id B770037B401
	for <freebsd-cluster@FreeBSD.ORG>; Thu, 12 Dec 2002 01:50:21 -0800 (PST)
Received: from gate.nentec.de (gate2.nentec.de [194.25.215.66])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 5BD6D43EC5
	for <freebsd-cluster@FreeBSD.ORG>; Thu, 12 Dec 2002 01:50:20 -0800 (PST)
	(envelope-from sporner@nentec.de)
Received: from nenny.nentec.de (root@nenny.nentec.de [153.92.64.1])
	by gate.nentec.de (8.11.3/8.9.3) with ESMTP id gBC9oIP05092;
	Thu, 12 Dec 2002 10:50:18 +0100
Received: from nentec.de (andromeda.nentec.de [153.92.64.34])
	by nenny.nentec.de (8.11.3/8.11.3) with ESMTP id gBC9oCt05933;
	Thu, 12 Dec 2002 10:50:13 +0100
Message-ID: <3DF85BD4.1050200@nentec.de>
Date: Thu, 12 Dec 2002 10:50:12 +0100
From: Andy Sporner <sporner@nentec.de>
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.2a) Gecko/20020910
X-Accept-Language: en-us, en
MIME-Version: 1.0
To: Michael Grant <mg-fbsd3@grant.org>
Cc: freebsd-cluster@FreeBSD.ORG
Subject: Re: sharing files within a cluster
References: <200212112225.gBBMPi411534@splat.grant.org>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
X-Virus-Scanned: by AMaViS-perl11-milter (http://amavis.org/)
Sender: owner-freebsd-cluster@FreeBSD.ORG
Precedence: bulk
List-ID: <freebsd-cluster.FreeBSD.ORG>
List-Archive: <http://docs.freebsd.org/mail/> (Web Archive)
List-Help: <mailto:majordomo@FreeBSD.ORG?subject=help> (List Instructions)
List-Subscribe: <mailto:majordomo@FreeBSD.ORG?subject=subscribe%20freebsd-cluster>
List-Unsubscribe: <mailto:majordomo@FreeBSD.ORG?subject=unsubscribe%20freebsd-cluster>
X-Loop: FreeBSD.ORG

Michael Grant wrote:

>Anyone know about this "single image linux cluster"?  How's that done?
>It seems to me that there would need to be something local to each
>box, minimally /etc, no?
>  
>

Don't know much about the Linux single image stuff.  I see dependancies
on RPC's and I get scared...  

In a nutshell, what I am looking for in a single image is very NUMA like
with the exception that instead of being a single image (one OS) it is a
cooperative image(many kernels, one execution space).   The only thing
that is "single image" is the VM that each machine exports to the single
image.  

I think it is possible architecturally to do this while minimizing the 
amount
of work that has to be done by the master node.  IE:  when a process 
starts on
a node, the PID is a multiple of the node number, same thing with file 
handles,
sockets and such.

>Andy, I agree with you that having a common process space and the
>ability to migrate processies across machines would be a big win.  If
>that could be done on a freebsd cluster along with a single image,
>that would definitely be a reason to use a freebsd cluster over a
>linux cluster.
>\
>

Thanks!  I agree as well.

>Realistically, how close are you to this?  
>

<SMALL FLAME>
It's really a matter of motivation ;-)  If I am having fun, the time
is short, if I am not, things tend to take an eternity.    I have my main
fun at work developing high speed networking hardware and I have
little time or tolerance for strange people--especially those who don't
like to reply to email, where I could normally expect one from.  I
don't mean necessarily our own superstars.  Lately I have come to the
conclusion that organized computing has no future because of the influx
of newbies (which are not bad--it's just now that if you are new on a
list the veterans regard you as a newbie and this is a barrier to normal
operation and cooperation).  I make some offers only once and if there
are no takers, I don't regard it as a worthwhile thing to do.  I had 
intended
to look into porting BPROC at some time, but it is now off the radar
screen. (though not far enough off to learn how they swap processes ;-))
</SMALL FLAME>

With that little flame out of the way ;-)  Here is the rough list of things
that must be accomplished to realize the goal.  Naturally the more people
working on it, the faster it goes because I only spend what spare time I
have on this.  

Here is what has been done:

    1.  Cluster configuration and monitor (build 114) for failover.
         (supports distrubuted process table views.  IE: there is NO
         master process table).

Here is what must be done:

    1.  Front end load balancing (later this is the gateway for network
         based processes that have to move arround the cluster with
         their sockets.)   Current plans are Dec 31 (will use initially
         IPFILTER).

    2.    The message infrastructure.   I have so far studied BPROC
         and DRDB.  I think it would be a weeks worth of work to make
         a messaging system that takes both requirements in account.  I
        would like to use the embedded TCP stack that I wrote for my
        current project, but that's Intellectual Property :-(  It's just the
        kind of thing we need (event driven with lot's of hooks for
        callouts).  But there is access to the stack pretty good inside.
        Current Plans Jan 15.

    3.  Process swapping engine.  I would like to extend the page
        swapping architecture (thank god FreeBSD has made this
        very modular!)  I regard this as a better mechanism than
         what I have seen in for instance BPROC.  That way shared
        memory can be supported too.

        This is the most difficult part and hard to put a time
        estimate on.  It would be very helpful to team up with
        people who know well the VM architecture good.
        It could be as early as March or April for this one.

    4.  Much testing!  That's I think a distributed affair so
        this can be very short perhaps.


Assumptions:

    1.  Use of some sort of shared filesystem for the machines
         that are part of the domain since this would be a
         distributed image.

    2.  Networked based (sockets) must pass through a front-end
         device to be directed to the node that owns the process.
         By nature of the Hi-AV failover stuff this would NOT be
         a single point of failure.

Executive Summary:

    probably early summer.  But I think it would take that long anyways
    to port the other stuff that is also available.




To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-cluster" in the body of the message