From owner-freebsd-hackers@FreeBSD.ORG Wed Jun 8 14:16:50 2005 Return-Path: X-Original-To: hackers@freebsd.org Delivered-To: freebsd-hackers@FreeBSD.ORG Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id CED6116A41C; Wed, 8 Jun 2005 14:16:50 +0000 (GMT) (envelope-from scottl@samsco.org) Received: from pooker.samsco.org (pooker.samsco.org [168.103.85.57]) by mx1.FreeBSD.org (Postfix) with ESMTP id 71CD743D1F; Wed, 8 Jun 2005 14:16:49 +0000 (GMT) (envelope-from scottl@samsco.org) Received: from [10.0.2.2] ([12.174.84.3]) (authenticated bits=0) by pooker.samsco.org (8.13.3/8.13.3) with ESMTP id j58EMBrP067476; Wed, 8 Jun 2005 08:22:14 -0600 (MDT) (envelope-from scottl@samsco.org) Message-ID: <42A6FD59.7060501@samsco.org> Date: Wed, 08 Jun 2005 08:14:49 -0600 From: Scott Long User-Agent: Mozilla/5.0 (X11; U; FreeBSD i386; en-US; rv:1.7.5) Gecko/20050218 X-Accept-Language: en-us, en MIME-Version: 1.0 To: Eric Anderson References: <42A475AB.6020808@fer.hr> <20050607194005.GG837@darkness.comp.waw.pl> <20050607201642.GA58346@walton.maths.tcd.ie> <42A6091C.40409@samsco.org> <42A647B8.30709@criticalmagic.com> <42A69A69.2040005@samsco.org> <42A6DAB3.4080105@centtech.com> In-Reply-To: <42A6DAB3.4080105@centtech.com> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=0.0 required=3.8 tests=none autolearn=failed version=3.0.2 X-Spam-Checker-Version: SpamAssassin 3.0.2 (2004-11-16) on pooker.samsco.org Cc: Pawel Jakub Dawidek , scottl@freebsd.org, Ivan Voras , David Malone , hackers@freebsd.org, phk@freebsd.org, Richard Coleman Subject: Re: Google SoC idea X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 08 Jun 2005 14:16:51 -0000 Eric Anderson wrote: > Scott Long wrote: > >> Richard Coleman wrote: >> >>> Scott Long wrote: >>> >>>> /me jumps up and down and waves his hands >>>> >>>> The problem with journalling at the block layer is that you pretty >>>> much become forced to journal metadata and data, since the block >>>> layer really doesn't know the distinction, and definitely not in a >>>> filesystem-independent way (yes, UFS does evil things to the buffer >>>> cache by representing metadata with negative block numbers, but that >>>> is just UFS). Full journalling has many drawbacks from the >>>> viewpoint of speed and complexity, of course. So you really want to >>>> be able to do just metadata journalling. >>>> >>>> Another hard part of distinguishing between metadata and data is >>>> that filesystems have a habit of migrating disk blocks from holding >>>> metadata to holding data, and vice versa (think indirect pointer >>>> blocks, not inode blocks). If you are only replaying metadata, you >>>> want to make sure that you don't smash data blocks with old metadata. >>>> >>>> Coming up with a filesystem independent way to represent all of this >>>> for the block layer is not easy. Filesystems would have to be able >>>> to be modified to provide proper metadata vs. data hints to the >>>> block layer. And if you're going to do that, then why not just make >>>> it a library in VFS, like what Darwin does? >>>> >>>> The UFS Journalling work is already well underway, and I expect it >>>> to follow the path of being a VFS library. Note that I'm saying >>>> 'library' here, not 'layer'. There really is no way to make >>>> journalling work with an arbitrary filesystem 'for free', whether as >>>> a VFS layer or a GEOM transform, since journalling is 100% dependent >>>> on the filesystem working with the buffer-cache to do sane >>>> operations in a defined in order. >>>> >>>> An alternate SoC project that would be very useful is block-level >>>> snapshots. I'm not sure if I'll be able to retain the filesystem >>>> snapshot functionality in UFS with journalling enabled, so moving to >>>> doing the snapshots in the block layer would be a good way to make >>>> up for this. Beware that while the GEOM transform would be pretty >>>> straight-forward to write, the real trick comes from being able to >>>> make the consumer of a block device (a filesystem, maybe) flush >>>> itself to a consistent state while the snapshot is being taken. The >>>> infrastructure for this is the part that is very interesting, but >>>> also the most work. >>>> >>>> Scott >>> >>> >>> >>> >>> Scott, >>> >>> Have you looked at the journaling layer that Matt has been adding to >>> DragonflyBSD? What you are talking about appears very similar. Or >>> am I misunderstanding something? >>> >>> Richard Coleman >>> rcoleman@criticalmagic.com >> >> >> >> Ah, you might have misunderstood my use of the term 'VFS library'. This >> is distinctly different from a 'VFS layer', which is what Matt did. >> I've looked extensively at his work, but unfortunately it doesn't solve >> the kinds of problems that I'm looking to solve. After discussing >> journalling this evening with the author of BeFS and HFS+J, I'm pretty >> happy that I'm taking the approach that I am. > > > Maybe a good SoC project (but maybe too much work) would be getting the > clustering UFS stuff going.. :) > > Eric > > > THat is more along the lines of a good master's of PhD topic. Scott