From owner-freebsd-fs@FreeBSD.ORG Wed Jul 20 02:37:26 2005 Return-Path: X-Original-To: freebsd-fs@freebsd.org Delivered-To: freebsd-fs@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 78C3316A41F for ; Wed, 20 Jul 2005 02:37:26 +0000 (GMT) (envelope-from yfyoufeng@263.net) Received: from smtp.263.net (mx01.263.net.cn [211.150.96.22]) by mx1.FreeBSD.org (Postfix) with ESMTP id 49AFC43D45 for ; Wed, 20 Jul 2005 02:37:14 +0000 (GMT) (envelope-from yfyoufeng@263.net) Received: from [10.217.12.183] (localhost [127.0.0.1]) by smtp.263.net (Postfix) with ESMTP id 32249C35B0; Wed, 20 Jul 2005 10:37:03 +0800 (CST) (envelope-from yfyoufeng@263.net) Received: from [10.217.12.183] (unknown [61.135.152.194]) by antispam-2 (Coremail:263(050316)) with SMTP id y0DmAM+43UJCB5jC.1 for ; Wed, 20 Jul 2005 10:37:03 +0800 (CST) X-TEBIE-Originating-IP: [61.135.152.194] From: yf-263 To: Eric Anderson In-Reply-To: <42DDB3F2.7020000@centtech.com> References: <200507020038.j620cO7F071025@gate.bitblocks.com> <42DDB3F2.7020000@centtech.com> Content-Type: text/plain; charset=UTF-8 Organization: Unix-driver.org Date: Wed, 20 Jul 2005 10:35:46 +0800 Message-Id: <1121826946.2235.6.camel@localhost.localdomain> Mime-Version: 1.0 X-Mailer: Evolution 2.2.2 (2.2.2-5) Content-Transfer-Encoding: 8bit Cc: freebsd-fs@freebsd.org Subject: Re: Cluster Filesystem for FreeBSD - any interest? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: yfyoufeng@263.net List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 20 Jul 2005 02:37:26 -0000 在 2005-07-19二的 21:16 -0500,Eric Anderson写道: > Bakul Shah wrote: > [..snip..] > >>:) I understand. Any nudging in the right direction here would be > >>appreciated. > > > > > > I'd probably start with modelling a single filesystem and how > > it maps to a sequence of disk blocks (*without* using any > > code or worrying about details of formats but capturing the > > essential elements). I'd describe various operations in > > terms of preconditions and postconditions. Then, I'd extend > > the model to deal with redundancy and so on. Then I'd model > > various failure modes. etc. If you are interested _enough_ > > we can take this offline and try to work something out. You > > may even be able to use perl to create an `executable' > > specification:-) > > I've done some research, and read some books/articles/white papers since > I started this thread. > > First, porting GFS might be a more universal effort, and might be > 'easier'. However, that doesn't get us a clustered filesystem with BSD > license (something that sounds good to me). It has been said it would be a seven man-month efforts for a FS expert. > > Clustering UFS2 would be cool. Here's what I'm looking for: It is exactly how "Lustre" doing its work, though it build itself on Ext3, and Lustre targets at http://www.lustre.org/docs/SGSRFP.pdf . > > A clustered filesystem (or layer?) that allows all machines in the > cluster to see the same filesystem as if it were local, with read/write > access. The cluster will need cache coherency across all nodes, and > there will need to be some sort of lock manager on each node to > communicate with all the other nodes to coordinate file locking. The > filesystem will have to support journaling. > > I'm wondering if one could make a pseudo filesystem something like > nullfs that sits on top of a UFS2 partition, and essentially monitors > all VFS operations to the filesystem, and communicates them over TCP/IP > to the other nodes in the cluster. That way, each node would know which > inodes and blocks are changing, so they can flush those buffers, and > they would know which blocks (or partial blocks) to view as locked as > another node locks it. This could be done via multicast, so all nodes in > the cluster would have to be running a distributed lock manager daemon > (dlmd) that would coordinate this. I think also that the UFS2 > filesystem would have to have a bit set upon mount that tracked it's > mount as a 'clustered' filesystem mount. The reason for that is so that > we could modify mount to only mount 'clustered' filesystems (mount -o > clustered) if the dlmd was running, since that would be a dependency for > stable coherent file control on a mount point. > > Does anyone have any insight as to whether a layer would work? Or maybe > I'm way off here and I need to do more reading :) > > Eric > > > -- yf-263 Unix-driver.org