From owner-freebsd-fs@FreeBSD.ORG Wed Jun 22 14:45:45 2005 Return-Path: X-Original-To: freebsd-fs@freebsd.org Delivered-To: freebsd-fs@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id EA01016A41C for ; Wed, 22 Jun 2005 14:45:45 +0000 (GMT) (envelope-from bakul@bitblocks.com) Received: from gate.bitblocks.com (bitblocks.com [209.204.185.216]) by mx1.FreeBSD.org (Postfix) with ESMTP id A55AD43D4C for ; Wed, 22 Jun 2005 14:45:45 +0000 (GMT) (envelope-from bakul@bitblocks.com) Received: from bitblocks.com (localhost [127.0.0.1]) by gate.bitblocks.com (8.13.3/8.13.1) with ESMTP id j5MEje5P097719; Wed, 22 Jun 2005 07:45:40 -0700 (PDT) (envelope-from bakul@bitblocks.com) Message-Id: <200506221445.j5MEje5P097719@gate.bitblocks.com> To: Eric Anderson In-reply-to: Your message of "Wed, 22 Jun 2005 07:09:34 CDT." <42B954FE.2070406@centtech.com> Date: Wed, 22 Jun 2005 07:45:40 -0700 From: Bakul Shah Cc: freebsd-fs@freebsd.org Subject: Re: Cluster Filesystem for FreeBSD - any interest? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 22 Jun 2005 14:45:46 -0000 > Hmm. I'm not sure if it can or not. I'll try to explain what I'm > dreaming of. I currently have about 1000 clients needing access to the > same pools of data (read/write) all the time. The data changes > constantly. There is a lot of this data. We use NFS currently. Sounds like you want SGI's clustered xfs.... > I'll be honest here - I'm not a code developer. I would love to learn > some C here, and 'just do it', but filesystems aren't exactly simple, so > I'm looking for a group of people that would love to code up something > amazing like this - I'll support the developers and hopefully learn > something in the process. My goal personally would be to do anything I > could to make the developers work most productively, and do testing. I > can probably provide equipment, and a good testbed for it. If you are not a seasoned programmer in _some_ language, this will not be easy at all. One suggestion is to develop an abstract model of what a CFS is. Coming up with a clear detailed precise specification is not an easy task either but it has to be done and if you can do it, it will be immensely helpful all around. You will truly understand what you are doing, you have a basis for evaluating design choices, you will have made choices before writing any code, you can write test cases, writing code is far easier etc. etc. Google for clustered filesystems. The citeseer site has some papers as well. A couple FS specific suggestions: - perhaps clustering can be built on top of existing filesystems. Each machine's local filesystem is considered a cache and you use some sort of cache coherency protocol. That way you don't have to deal with filesystem allocation and layout issues. - a network wide stable storage `disk' may be easier to do given GEOM. There are atleast N copies of each data block. Data may be cached locally at any site but writing data is done as a distributed transaction. So again cache coherency is needed. A network RAID if you will! But again, let me stress that one must have a clear *model* of the problem being solved. Getting distributed programs right is very hard even at an abstract model level. Debugging a distributed program that doesn't have a clear model is, well, for masochists (nothing against them -- I bet even they'd rather get their pain some other way:-)