From owner-freebsd-fs  Mon Feb 12 12:35: 5 2001
Delivered-To: freebsd-fs@freebsd.org
Received: from bingnet2.cc.binghamton.edu (bingnet2.cc.binghamton.edu [128.226.1.18])
	by hub.freebsd.org (Postfix) with ESMTP id EA57D37B491
	for <freebsd-fs@FreeBSD.ORG>; Mon, 12 Feb 2001 12:35:01 -0800 (PST)
Received: from opal (cs.binghamton.edu [128.226.123.101])
	by bingnet2.cc.binghamton.edu (8.11.2/8.11.2) with ESMTP id f1CKYsG11922;
	Mon, 12 Feb 2001 15:34:54 -0500 (EST)
Date: Mon, 12 Feb 2001 15:34:54 -0500 (EST)
From: Zhiui Zhang <zzhang@cs.binghamton.edu>
X-Sender: zzhang@opal
To: Russell Cattelan <cattelan@thebarn.com>
Cc: freebsd-fs@FreeBSD.ORG
Subject: Re: Design a journalled file system
In-Reply-To: <3A883B74.F1CAFAFE@thebarn.com>
Message-ID: <Pine.SOL.4.21.0102121516200.13995-100000@opal>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org


On Mon, 12 Feb 2001, Russell Cattelan wrote:

> > Another difficulty is that if several transactions are in progress at the
> > same time, we must remember which metadata buffers are modified by which
> > transactions. When we copy/rename the buffer, we must inform those
> > transactions the fact that we did the copy/rename.  The buffers modified
> > by one transaction must be flushed at the same time.

Thanks for your reply. I mean if a transaction locks down all the metadata
(e.g., bitmap blocks) it modified until it commits, then there is no
problem (but this reduces concurrency). Otherwise, the same metadata
blocks can contain modifications done by more than one transaction. I do
not know how XFS solves this problem.  Since XFS uses B+ tree, I guess
that locking can be done in a hierarchy way easily to avoid deadlock.  
But in FFS, the bitmap blocks has no relationship with each other. Locking
the bitmap blocks in FFS in arbitrary order can cause deadlock, I guess.

IBM JFS seems to use incore log implemented as page cache. XFS has
pagebuf.  I expect that is something similar to IBM's page cache.


> Hmm I'm not sure what the problem is here.
> A transaction log entry will log all changes necessary to complete
> that transaction, even if it involves multiple meta data objects, which is
> almost always does.
> In the event of a crash and  subsequent replay of the log: the recovery code
> will make sure all the meta data on the disk is consistent with the log.
> If one meta data write happened but the another one didn't the recovery
> code only updates the  one that didn't complete.
> 
> What is the size of the disk block container on bsd buf_t's ?
> if they are 64bit we shouldn't have a problem... simply use absolution disk
> addressing for meta data items.
> Why would we need  to copy a meta data buf_t?
> 

In sys/buf.h of FreeBSD, it has:

   daddr_t b_lblkno;               /* Logical block number. */
   daddr_t b_blkno;                /* Underlying physical block number. */

Both are 32-bit integer. I am not sure why it is not 64-bit. Maybe it has
something to do with merged buffer cache.

-Zhihui


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message