From owner-freebsd-hackers@FreeBSD.ORG Wed Jun 8 01:18:37 2005 Return-Path: X-Original-To: hackers@FreeBSD.org Delivered-To: freebsd-hackers@FreeBSD.ORG Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 3BBB216A41C; Wed, 8 Jun 2005 01:18:37 +0000 (GMT) (envelope-from rcoleman@criticalmagic.com) Received: from saturn.criticalmagic.com (saturn.criticalmagic.com [69.61.68.51]) by mx1.FreeBSD.org (Postfix) with ESMTP id EB9C343D53; Wed, 8 Jun 2005 01:18:36 +0000 (GMT) (envelope-from rcoleman@criticalmagic.com) Received: from [172.16.0.200] (adsl-34-200-245.asm.bellsouth.net [67.34.200.245]) by saturn.criticalmagic.com (Postfix) with ESMTP id 99FE13BD10; Tue, 7 Jun 2005 21:18:33 -0400 (EDT) Message-ID: <42A647B8.30709@criticalmagic.com> Date: Tue, 07 Jun 2005 21:19:52 -0400 From: Richard Coleman User-Agent: Mozilla Thunderbird 1.0.2 (Windows/20050317) X-Accept-Language: en-us, en MIME-Version: 1.0 To: Scott Long References: <42A475AB.6020808@fer.hr> <20050607194005.GG837@darkness.comp.waw.pl> <20050607201642.GA58346@walton.maths.tcd.ie> <42A6091C.40409@samsco.org> In-Reply-To: <42A6091C.40409@samsco.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: Pawel Jakub Dawidek , scottl@FreeBSD.org, Ivan Voras , David Malone , hackers@FreeBSD.org, phk@FreeBSD.org Subject: Re: Google SoC idea X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 08 Jun 2005 01:18:37 -0000 Scott Long wrote: > /me jumps up and down and waves his hands > > The problem with journalling at the block layer is that you pretty much > become forced to journal metadata and data, since the block layer really > doesn't know the distinction, and definitely not in a > filesystem-independent way (yes, UFS does evil things to the buffer > cache by representing metadata with negative block numbers, but that is > just UFS). Full journalling has many drawbacks from the viewpoint of > speed and complexity, of course. So you really want to be able to do > just metadata journalling. > > Another hard part of distinguishing between metadata and data is that > filesystems have a habit of migrating disk blocks from holding metadata > to holding data, and vice versa (think indirect pointer blocks, not > inode blocks). If you are only replaying metadata, you want to make > sure that you don't smash data blocks with old metadata. > > Coming up with a filesystem independent way to represent all of this for > the block layer is not easy. Filesystems would have to be able to be > modified to provide proper metadata vs. data hints to the block layer. > And if you're going to do that, then why not just make it a library in > VFS, like what Darwin does? > > The UFS Journalling work is already well underway, and I expect it to > follow the path of being a VFS library. Note that I'm saying 'library' > here, not 'layer'. There really is no way to make journalling work with > an arbitrary filesystem 'for free', whether as a VFS layer or a GEOM > transform, since journalling is 100% dependent on the filesystem working > with the buffer-cache to do sane operations in a defined in order. > > An alternate SoC project that would be very useful is block-level > snapshots. I'm not sure if I'll be able to retain the filesystem > snapshot functionality in UFS with journalling enabled, so moving to > doing the snapshots in the block layer would be a good way to make up > for this. Beware that while the GEOM transform would be pretty > straight-forward to write, the real trick comes from being able to make > the consumer of a block device (a filesystem, maybe) flush itself to a > consistent state while the snapshot is being taken. The infrastructure > for this is the part that is very interesting, but also the most work. > > Scott Scott, Have you looked at the journaling layer that Matt has been adding to DragonflyBSD? What you are talking about appears very similar. Or am I misunderstanding something? Richard Coleman rcoleman@criticalmagic.com