Date: Wed, 3 Nov 1999 10:18:58 +0100 From: Eivind Eklund <eivind@FreeBSD.org> To: Greg Lehey <grog@lemis.com> Cc: Don <don@calis.blacksun.org>, Jacques Vidrine <n@nectar.com>, freebsd-fs@FreeBSD.org Subject: Re: journaling UFS and LFS Message-ID: <19991103101858.E72085@bitbox.follo.net> In-Reply-To: <19991102154614.55760@mojave.sitaranetworks.com>; from grog@lemis.com on Tue, Nov 02, 1999 at 03:46:14PM -0500 References: <19991030233304.03DB31DA4@bone.nectar.com> <Pine.BSF.4.05.9910301936530.44134-100000@calis.blacksun.org> <19991101171936.J72085@bitbox.follo.net> <19991102154614.55760@mojave.sitaranetworks.com>
next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, Nov 02, 1999 at 03:46:14PM -0500, Greg Lehey wrote: > On Monday, 1 November 1999 at 17:19:36 +0100, Eivind Eklund wrote: > > On Sat, Oct 30, 1999 at 07:40:35PM -0400, Don wrote: > >> This is getting off topic. What features would you like to see in a new > >> file system. Some suggestions were made. Would you like to add anything to > >> this list? > > > > Yes. > > * Easy to do concurrent access from multiple hosts to the same > > physical media > > You can never do this in the general case (where any host may request > access to any part of the disk). The best you could do there is a > file server, but they're not quite our terms of reference. I don't get this. To give a little more detail in what I mean: You have the FS export a bunch of locks into the DLM (Distributed Lock Manager) you are running (probably over the bus you use to share access to the disks, but you can use another connection media as long as it is there), and the host that wants to do something to some part of the FS grabs the relevant lock. You also design the disk layout to allow writing in a transactional way, so a host failure while the host has a lock doesn't hurt the other hosts accessing the same physical media. I don't get what "general case" there is, as you're designing the system - could you please explain? > > * Ability to span more than one disk > > That's not necessarily a file system feature. Vinum does that now. Sure. The reason for having it in the FS is that you can optimize for the independence of your spindles. This lets you: * Write logs and data to separate spindles (increasing performance) * Give performance guarantees proportional to the number and features of your spindles, instead of being limited by what your weakest link can do (times one) * Optimize data layout to be able to do a semi-recovery after losing one of your spindles * (irrelevant unless we extend the userland interface, which was planned for G2) Give different guarantees for different files in the same namespace. You may need RAID-0 to get the speed wanted for one non-critical file, while wanting RAID-5 to store a file that need safe storage, but don't need fast streaming. > > I have design papers on the FS designed for G2, which was intended to > > support all of the features I've seen listed so far. It has a couple > > of drawbacks: > > (1) It is not designed to have the semantics of a standard Unix > > filesystem. > > That doesn't surprise me, if you want to implement the first of your > suggestions. Actually, that's not a problem - but we decided against pushing any complexity into the bottom end filesystem if we could do it well in a stacking layer. > Is there anything in there which would be of interest in our > environment? As I said, it supports all features I've seen mentioned (by anybody) so far in the discussion. Its most most significant design goal was to support Highly Available Systems; that is, clusters. The design allows more than one machine in a cluster to access a shared disk with a HAS-FS on it, with the system as a whole surviving the (unplanned) loss of any individual member. I think we ended up supporting transactions built from several file operations in multi-machine context, too, but I'm not 100% sure (it is almost 1 1/2 year since Simon and I did the design, which was done during a single three-week session in the same physical location, and I've not worked with the spec since). Eivind. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-fs" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?19991103101858.E72085>