From owner-freebsd-fs  Mon Feb 12 11:39:15 2001
Delivered-To: freebsd-fs@freebsd.org
Received: from sgi.com (sgi.SGI.COM [192.48.153.1])
	by hub.freebsd.org (Postfix) with ESMTP id 0FDF337B491
	for <freebsd-fs@FreeBSD.ORG>; Mon, 12 Feb 2001 11:39:08 -0800 (PST)
Received: from ledzep.americas.sgi.com (relay.cray.com [137.38.226.97]) 
	by sgi.com (980327.SGI.8.8.8-aspam/980304.SGI-aspam:
       SGI does not authorize the use of its proprietary
       systems or networks for unsolicited or bulk email
       from the Internet.) 
	via ESMTP id LAA08233; Mon, 12 Feb 2001 11:38:29 -0800 (PST)
	mail_from (cattelan@thebarn.com)
Received: from gibble.americas.sgi.com (gibble.americas.sgi.com [128.162.195.80]) by ledzep.americas.sgi.com (SGI-SGI-8.9.3/americas-smart-nospam1.1) with ESMTP id NAA70341; Mon, 12 Feb 2001 13:38:29 -0600 (CST)
Received: from thebarn.com (localhost [127.0.0.1])
	by gibble.americas.sgi.com (8.11.0/8.11.0) with ESMTP id f1CJbP022623;
	Mon, 12 Feb 2001 13:37:29 -0600
Message-ID: <3A883B74.F1CAFAFE@thebarn.com>
Date: Mon, 12 Feb 2001 13:37:24 -0600
From: Russell Cattelan <cattelan@thebarn.com>
X-Mailer: Mozilla 4.76 [en] (X11; U; Linux 2.4.1-XFS i686)
X-Accept-Language: en
MIME-Version: 1.0
To: Zhiui Zhang <zzhang@cs.binghamton.edu>
Cc: freebsd-fs@FreeBSD.ORG
Subject: Re: Design a journalled file system
References: <Pine.SOL.4.21.0102091214440.4738-100000@onyx>
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: owner-freebsd-fs@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org

Zhiui Zhang wrote:

> I guess that this will involve either memory copying or changing the
> buffer header directly. Linux seems to address buffer directly via
> physical (not logical) block number, so there is no need to change the
> buffer header. Plus, Linux have a reference count to prevent a buffer from
> disappearing (brelse()'ed).

Yes this is true.

>
>
> Another difficulty is that if several transactions are in progress at the
> same time, we must remember which metadata buffers are modified by which
> transactions. When we copy/rename the buffer, we must inform those
> transactions the fact that we did the copy/rename.  The buffers modified
> by one transaction must be flushed at the same time.

Hmm I'm not sure what the problem is here.
A transaction log entry will log all changes necessary to complete
that transaction, even if it involves multiple meta data objects, which is
almost always does.
In the event of a crash and  subsequent replay of the log: the recovery code
will make sure all the meta data on the disk is consistent with the log.
If one meta data write happened but the another one didn't the recovery
code only updates the  one that didn't complete.

What is the size of the disk block container on bsd buf_t's ?
if they are 64bit we shouldn't have a problem... simply use absolution disk
addressing for meta data items.
Why would we need  to copy a meta data buf_t?


>
> BTW, Linux GFS code seems to allow ONE transaction in progess at any time.
>
> -Zhihui
>
> On Fri, 9 Feb 2001, Russell Cattelan wrote:
>
> > Zhiui Zhang wrote:
> >
> > > I am considering the design of a journalled file system in FreeBSD. I
> > > think each transaction corresponds to a file system update operation and
> > > will therefore consists of a list of modified buffers.  The important
> > > thing is that these buffers should not be written to disk until they have
> > > been logged into the log area. To do so, we need to pin these buffers in
> > > memory for a while. The concept should be simple, but I run into a problem
> > > which I have no idea how to solve it:
> > >
> > > If you access a lot of files quickly, some vnodes will be reused.  These
> > > vnodes can contain buffers that are still pinned in the memory because of
> > > the write-ahead logging constraints.  After a vnode is gone, we have
> > > no way to recover its buffers. Note that whenever we need a new vnode, we
> > > are in the process of creating a new file. At this point, we can not flush
> > > the buffers to the log area.  The result is a deadlock.
> >
> > XFS:
> > All pinned buffers are keep on a queue to be flushed by a
> > daemon that walks the queue looking for buffer that
> > have recently become unlocked and unpinned.
> >
> >
> > >
> > >
> > > I could make copies of the buffers that are still pinned, but that incurs
> > > memory copy and need buffer headers, which is also a rare resource.
> > >
> > > The design is similar to ext3fs of linux (they do not seem to have a vnode
> > > layer and they use device + physical block number instead of vnode +
> > > logical block number to index buffers, which, I guess, means that buffers
> > > can exist after the inode is gone). I know Mckusick has a paper on
> >
> > Yup.  All meta data buffer use  and absolute device offset.
> >
> >
> > > journalling FFS, but I just want to know if this design can work or not.
> > >
> > > Any ideas?  Thanks for your help!
> > >
> > > -Zhihui
> > >
> > > To Unsubscribe: send mail to majordomo@FreeBSD.org
> > > with "unsubscribe freebsd-fs" in the body of the message
> >
> > --
> > Russell Cattelan
> > cattelan@thebarn.com
> >
> >
> >
> >

--
Russell Cattelan
--
Digital Elves inc. -- Currently on loan to SGI
Linux XFS core developer.


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-fs" in the body of the message