From owner-freebsd-fs@FreeBSD.ORG  Wed Jul 20 02:37:26 2005
Return-Path: <owner-freebsd-fs@FreeBSD.ORG>
X-Original-To: freebsd-fs@freebsd.org
Delivered-To: freebsd-fs@freebsd.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 78C3316A41F
	for <freebsd-fs@freebsd.org>; Wed, 20 Jul 2005 02:37:26 +0000 (GMT)
	(envelope-from yfyoufeng@263.net)
Received: from smtp.263.net (mx01.263.net.cn [211.150.96.22])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 49AFC43D45
	for <freebsd-fs@freebsd.org>; Wed, 20 Jul 2005 02:37:14 +0000 (GMT)
	(envelope-from yfyoufeng@263.net)
Received: from [10.217.12.183] (localhost [127.0.0.1])
	by smtp.263.net (Postfix) with ESMTP
	id 32249C35B0; Wed, 20 Jul 2005 10:37:03 +0800 (CST)
	(envelope-from yfyoufeng@263.net)
Received: from [10.217.12.183] (unknown [61.135.152.194])
	by antispam-2 (Coremail:263(050316)) with SMTP id y0DmAM+43UJCB5jC.1
	for <anderson@centtech.com>; Wed, 20 Jul 2005 10:37:03 +0800 (CST)
X-TEBIE-Originating-IP: [61.135.152.194]
From: yf-263 <yfyoufeng@263.net>
To: Eric Anderson <anderson@centtech.com>
In-Reply-To: <42DDB3F2.7020000@centtech.com>
References: <200507020038.j620cO7F071025@gate.bitblocks.com>
	<42DDB3F2.7020000@centtech.com>
Content-Type: text/plain; charset=UTF-8
Organization: Unix-driver.org
Date: Wed, 20 Jul 2005 10:35:46 +0800
Message-Id: <1121826946.2235.6.camel@localhost.localdomain>
Mime-Version: 1.0
X-Mailer: Evolution 2.2.2 (2.2.2-5) 
Content-Transfer-Encoding: 8bit
Cc: freebsd-fs@freebsd.org
Subject: Re: Cluster Filesystem for FreeBSD - any interest?
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
Reply-To: yfyoufeng@263.net
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
	<mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Wed, 20 Jul 2005 02:37:26 -0000

在 2005-07-19二的 21:16 -0500，Eric Anderson写道：
> Bakul Shah wrote:
> [..snip..]
> >>:) I understand.  Any nudging in the right direction here would be
> >>appreciated.
> > 
> > 
> > I'd probably start with modelling a single filesystem and how
> > it maps to a sequence of disk blocks (*without* using any
> > code or worrying about details of formats but capturing the
> > essential elements).  I'd describe various operations in
> > terms of preconditions and postconditions.  Then, I'd extend
> > the model to deal with redundancy and so on.  Then I'd model
> > various failure modes. etc.  If you are interested _enough_
> > we can take this offline and try to work something out.  You
> > may even be able to use perl to create an `executable'
> > specification:-)
> 
> I've done some research, and read some books/articles/white papers since 
> I started this thread.
> 
> First, porting GFS might be a more universal effort, and might be 
> 'easier'.  However, that doesn't get us a clustered filesystem with BSD 
> license (something that sounds good to me).

It has been said it would be a seven man-month efforts for a FS expert.

> 
> Clustering UFS2 would be cool.  Here's what I'm looking for:

It is exactly how "Lustre" doing its work, though it build itself on
Ext3, and Lustre targets at  http://www.lustre.org/docs/SGSRFP.pdf .

> 
> A clustered filesystem (or layer?) that allows all machines in the 
> cluster to see the same filesystem as if it were local, with read/write 
> access.  The cluster will need cache coherency across all nodes, and 
> there will need to be some sort of lock manager on each node to 
> communicate with all the other nodes to coordinate file locking.  The 
> filesystem will have to support journaling.
> 
> I'm wondering if one could make a pseudo filesystem something like 
> nullfs that sits on top of a UFS2 partition, and essentially monitors 
> all VFS operations to the filesystem, and communicates them over TCP/IP 
> to the other nodes in the cluster.  That way, each node would know which 
> inodes and blocks are changing, so they can flush those buffers, and 
> they would know which blocks (or partial blocks) to view as locked as 
> another node locks it. This could be done via multicast, so all nodes in 
> the cluster would have to be running a distributed lock manager daemon 
> (dlmd) that would coordinate this.  I think also that the UFS2 
> filesystem would have to have a bit set upon mount that tracked it's 
> mount as a 'clustered' filesystem mount.  The reason for that is so that 
> we could modify mount to only mount 'clustered' filesystems (mount -o 
> clustered) if the dlmd was running, since that would be a dependency for 
> stable coherent file control on a mount point.
> 
> Does anyone have any insight as to whether a layer would work?  Or maybe 
> I'm way off here and I need to do more reading :)
> 
> Eric
> 
> 
> 
-- 
yf-263 <yfyoufeng@263.net>
Unix-driver.org