From owner-freebsd-geom@FreeBSD.ORG  Tue Jun 20 20:29:49 2006
Return-Path: <owner-freebsd-geom@FreeBSD.ORG>
X-Original-To: freebsd-geom@FreeBSD.org
Delivered-To: freebsd-geom@FreeBSD.org
Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125])
	by hub.freebsd.org (Postfix) with ESMTP id 781AC16A474;
	Tue, 20 Jun 2006 20:29:49 +0000 (UTC)
	(envelope-from bakul@bitblocks.com)
Received: from mail.bitblocks.com (bitblocks.com [209.204.185.216])
	by mx1.FreeBSD.org (Postfix) with ESMTP id 33FC543D46;
	Tue, 20 Jun 2006 20:29:49 +0000 (GMT)
	(envelope-from bakul@bitblocks.com)
Received: from bitblocks.com (localhost [127.0.0.1])
	by mail.bitblocks.com (Postfix) with ESMTP id 933F2294C1;
	Tue, 20 Jun 2006 13:29:48 -0700 (PDT)
To: Pawel Jakub Dawidek <pjd@FreeBSD.org>
In-reply-to: Your message of "Mon, 19 Jun 2006 15:11:01 +0200."
	<20060619131101.GD1130@garage.freebsd.pl> 
Date: Tue, 20 Jun 2006 13:29:48 -0700
From: Bakul Shah <bakul@bitblocks.com>
Message-Id: <20060620202948.933F2294C1@mail.bitblocks.com>
Cc: freebsd-fs@FreeBSD.org, freebsd-current@FreeBSD.org,
	freebsd-geom@FreeBSD.org
Subject: Re: Journaling UFS with gjournal. 
X-BeenThere: freebsd-geom@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: GEOM-specific discussions and implementations
	<freebsd-geom.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-geom>,
	<mailto:freebsd-geom-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-geom>
List-Post: <mailto:freebsd-geom@freebsd.org>
List-Help: <mailto:freebsd-geom-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-geom>,
	<mailto:freebsd-geom-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 20 Jun 2006 20:29:49 -0000

This is great!  We have sorely needed this for quite a while
what with terabyte size filesystems getting into common use.

> How it works (in short). You may define one or two providers which
> gjournal will use. If one provider is given, it will be used for both -
> data and journal. If two providers are given, one will be used for data
> and one for journal.
> Every few seconds (you may define how many) journal is terminated and
> marked as consistent and gjournal starts to copy data from it to the
> data provider. In the same time new data are stored in new journal.

Some random comments:

Would it make sense to treat the journal as a circular
buffer?  Then commit to the underlying provider starts when
the buffer has $hiwater blocks or the upper layer wants to
sync.  The commit stops when the buffer has $lowater blocks
or in case of sync the buffer is empty.  This will allow
parallel writes to the provider and the journal, thereby
reducing latency.

I don't understand why you need FS synchronization.  Once the
journal is written, the data is safe.  A "redo" may be needed
after a crash to sync the filesystem but that is about it.
Redo should be idempotent.  Each journal write block may need
some flags.  For instance mark a block as a "sync point" --
when this block is on the disk, the FS will be in a
consistent state.  In case of redo after crash you have to
throw away all the journal blocks after the last sync point.

It seems to me if you write a serial number with each data
block, in the worst case redo has to do a binary search to
find the first block to write but normal writes to journal
and reads from journal (for commiting to the provider) can be
completely sequential.  Since redo will be much much faster
than fsck you can afford to slow it down a bit if the normal
case can be speeded up.

Presumably you disallow opening any file in /.deleted.

Can you gjournal the journal disk?  Recursion is good:-)

-- bakul