Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 19 Jul 2005 21:16:18 -0500
From:      Eric Anderson <anderson@centtech.com>
To:        freebsd-fs@freebsd.org
Subject:   Re: Cluster Filesystem for FreeBSD - any interest?
Message-ID:  <42DDB3F2.7020000@centtech.com>
In-Reply-To: <200507020038.j620cO7F071025@gate.bitblocks.com>
References:  <200507020038.j620cO7F071025@gate.bitblocks.com>

next in thread | previous in thread | raw e-mail | index | archive | help
Bakul Shah wrote:
[..snip..]
>>:) I understand.  Any nudging in the right direction here would be
>>appreciated.
> 
> 
> I'd probably start with modelling a single filesystem and how
> it maps to a sequence of disk blocks (*without* using any
> code or worrying about details of formats but capturing the
> essential elements).  I'd describe various operations in
> terms of preconditions and postconditions.  Then, I'd extend
> the model to deal with redundancy and so on.  Then I'd model
> various failure modes. etc.  If you are interested _enough_
> we can take this offline and try to work something out.  You
> may even be able to use perl to create an `executable'
> specification:-)

I've done some research, and read some books/articles/white papers since 
I started this thread.

First, porting GFS might be a more universal effort, and might be 
'easier'.  However, that doesn't get us a clustered filesystem with BSD 
license (something that sounds good to me).

Clustering UFS2 would be cool.  Here's what I'm looking for:

A clustered filesystem (or layer?) that allows all machines in the 
cluster to see the same filesystem as if it were local, with read/write 
access.  The cluster will need cache coherency across all nodes, and 
there will need to be some sort of lock manager on each node to 
communicate with all the other nodes to coordinate file locking.  The 
filesystem will have to support journaling.

I'm wondering if one could make a pseudo filesystem something like 
nullfs that sits on top of a UFS2 partition, and essentially monitors 
all VFS operations to the filesystem, and communicates them over TCP/IP 
to the other nodes in the cluster.  That way, each node would know which 
inodes and blocks are changing, so they can flush those buffers, and 
they would know which blocks (or partial blocks) to view as locked as 
another node locks it. This could be done via multicast, so all nodes in 
the cluster would have to be running a distributed lock manager daemon 
(dlmd) that would coordinate this.  I think also that the UFS2 
filesystem would have to have a bit set upon mount that tracked it's 
mount as a 'clustered' filesystem mount.  The reason for that is so that 
we could modify mount to only mount 'clustered' filesystems (mount -o 
clustered) if the dlmd was running, since that would be a dependency for 
stable coherent file control on a mount point.

Does anyone have any insight as to whether a layer would work?  Or maybe 
I'm way off here and I need to do more reading :)

Eric



-- 
------------------------------------------------------------------------
Eric Anderson        Sr. Systems Administrator        Centaur Technology
A lost ounce of gold may be found, a lost moment of time never.
------------------------------------------------------------------------



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?42DDB3F2.7020000>