From owner-freebsd-geom@FreeBSD.ORG Tue Jun 20 20:29:49 2006 Return-Path: X-Original-To: freebsd-geom@FreeBSD.org Delivered-To: freebsd-geom@FreeBSD.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 781AC16A474; Tue, 20 Jun 2006 20:29:49 +0000 (UTC) (envelope-from bakul@bitblocks.com) Received: from mail.bitblocks.com (bitblocks.com [209.204.185.216]) by mx1.FreeBSD.org (Postfix) with ESMTP id 33FC543D46; Tue, 20 Jun 2006 20:29:49 +0000 (GMT) (envelope-from bakul@bitblocks.com) Received: from bitblocks.com (localhost [127.0.0.1]) by mail.bitblocks.com (Postfix) with ESMTP id 933F2294C1; Tue, 20 Jun 2006 13:29:48 -0700 (PDT) To: Pawel Jakub Dawidek In-reply-to: Your message of "Mon, 19 Jun 2006 15:11:01 +0200." <20060619131101.GD1130@garage.freebsd.pl> Date: Tue, 20 Jun 2006 13:29:48 -0700 From: Bakul Shah Message-Id: <20060620202948.933F2294C1@mail.bitblocks.com> Cc: freebsd-fs@FreeBSD.org, freebsd-current@FreeBSD.org, freebsd-geom@FreeBSD.org Subject: Re: Journaling UFS with gjournal. X-BeenThere: freebsd-geom@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: GEOM-specific discussions and implementations List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 20 Jun 2006 20:29:49 -0000 This is great! We have sorely needed this for quite a while what with terabyte size filesystems getting into common use. > How it works (in short). You may define one or two providers which > gjournal will use. If one provider is given, it will be used for both - > data and journal. If two providers are given, one will be used for data > and one for journal. > Every few seconds (you may define how many) journal is terminated and > marked as consistent and gjournal starts to copy data from it to the > data provider. In the same time new data are stored in new journal. Some random comments: Would it make sense to treat the journal as a circular buffer? Then commit to the underlying provider starts when the buffer has $hiwater blocks or the upper layer wants to sync. The commit stops when the buffer has $lowater blocks or in case of sync the buffer is empty. This will allow parallel writes to the provider and the journal, thereby reducing latency. I don't understand why you need FS synchronization. Once the journal is written, the data is safe. A "redo" may be needed after a crash to sync the filesystem but that is about it. Redo should be idempotent. Each journal write block may need some flags. For instance mark a block as a "sync point" -- when this block is on the disk, the FS will be in a consistent state. In case of redo after crash you have to throw away all the journal blocks after the last sync point. It seems to me if you write a serial number with each data block, in the worst case redo has to do a binary search to find the first block to write but normal writes to journal and reads from journal (for commiting to the provider) can be completely sequential. Since redo will be much much faster than fsck you can afford to slow it down a bit if the normal case can be speeded up. Presumably you disallow opening any file in /.deleted. Can you gjournal the journal disk? Recursion is good:-) -- bakul