From owner-freebsd-fs@FreeBSD.ORG Fri Jul 1 12:32:39 2005 Return-Path: X-Original-To: freebsd-fs@freebsd.org Delivered-To: freebsd-fs@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 3FCBF16A41C for ; Fri, 1 Jul 2005 12:32:39 +0000 (GMT) (envelope-from anderson@centtech.com) Received: from mh2.centtech.com (moat3.centtech.com [207.200.51.50]) by mx1.FreeBSD.org (Postfix) with ESMTP id EC25543D1F for ; Fri, 1 Jul 2005 12:32:38 +0000 (GMT) (envelope-from anderson@centtech.com) Received: from [10.177.171.220] (neutrino.centtech.com [10.177.171.220]) by mh2.centtech.com (8.13.1/8.13.1) with ESMTP id j61CWZAt077267; Fri, 1 Jul 2005 07:32:35 -0500 (CDT) (envelope-from anderson@centtech.com) Message-ID: <42C537D8.2000403@centtech.com> Date: Fri, 01 Jul 2005 07:32:24 -0500 From: Eric Anderson User-Agent: Mozilla/5.0 (X11; U; FreeBSD i386; en-US; rv:1.7.8) Gecko/20050603 X-Accept-Language: en-us, en MIME-Version: 1.0 To: Bakul Shah References: <200506221445.j5MEje5P097719@gate.bitblocks.com> In-Reply-To: <200506221445.j5MEje5P097719@gate.bitblocks.com> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit Cc: freebsd-fs@freebsd.org Subject: Re: Cluster Filesystem for FreeBSD - any interest? X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 01 Jul 2005 12:32:39 -0000 Bakul Shah wrote: >>Hmm. I'm not sure if it can or not. I'll try to explain what I'm >>dreaming of. I currently have about 1000 clients needing access to the >>same pools of data (read/write) all the time. The data changes >>constantly. There is a lot of this data. We use NFS currently. > > > Sounds like you want SGI's clustered xfs.... Maybe so - also, GFS looks perfect as well, only side effect is license, but that isn't *that* big of an issue. Porting may be easier? >>I'll be honest here - I'm not a code developer. I would love to learn >>some C here, and 'just do it', but filesystems aren't exactly simple, so >>I'm looking for a group of people that would love to code up something >>amazing like this - I'll support the developers and hopefully learn >>something in the process. My goal personally would be to do anything I >>could to make the developers work most productively, and do testing. I >>can probably provide equipment, and a good testbed for it. > > > If you are not a seasoned programmer in _some_ language, this > will not be easy at all. Well, depends on if you call perl programming 'real' programming. :) Perl just isn't quite the tool for this job (although I'm sure there are some out there that might actually argue). > One suggestion is to develop an abstract model of what a CFS > is. Coming up with a clear detailed precise specification is > not an easy task either but it has to be done and if you can > do it, it will be immensely helpful all around. You will > truly understand what you are doing, you have a basis for > evaluating design choices, you will have made choices before > writing any code, you can write test cases, writing code is > far easier etc. etc. Google for clustered filesystems. > The citeseer site has some papers as well. Thanks - this is a great suggestion. I'll try to come up with something. Really, the truth is (now that I have read even more docs), RedHat's GFS is exactly what I would like for FreeBSD. They already have all the components, etc. I would prefer a BSD licensed piece of software, but I just want something that works on FreeBSD mostly. > A couple FS specific suggestions: > - perhaps clustering can be built on top of existing > filesystems. Each machine's local filesystem is considered > a cache and you use some sort of cache coherency protocol. > That way you don't have to deal with filesystem allocation > and layout issues. I see - that's an interesting idea. Almost like each machine could mount the shared version read-only, then slap a layer on top that is connected to a cache coherency manager (maybe there is a daemon on each node, and the nodes sync their caches via the network) to keep the filesystems 'in sync'. Then maybe only one elected node actually writes the data to the disk. If that node dies, then another node is elected. > - a network wide stable storage `disk' may be easier to do > given GEOM. There are atleast N copies of each data block. > Data may be cached locally at any site but writing data is > done as a distributed transaction. So again cache > coherency is needed. A network RAID if you will! I'm not sure how this would work. A network RAID with geom+ggate is simple (I've done this a couple times - cool!), but how does that get me shared read-write access to the same data? > But again, let me stress that one must have a clear *model* > of the problem being solved. Getting distributed programs > right is very hard even at an abstract model level. > Debugging a distributed program that doesn't have a clear > model is, well, for masochists (nothing against them -- I > bet even they'd rather get their pain some other way:-) :) I understand. Any nudging in the right direction here would be appreciated. Eric -- ------------------------------------------------------------------------ Eric Anderson Sr. Systems Administrator Centaur Technology A lost ounce of gold may be found, a lost moment of time never. ------------------------------------------------------------------------